DESIGN ASPECTS OF A THESAURUS FOR THEMATIC SEARCH IN GEOSPATIAL METADATA SERVICES
P. Ahonen-Rainio, K. Korhonen
In the context of spatial data infrastructures, one of the key questions is how users can discover potentially suitable geographic datasets from among the rapidly increasing data resources. Keywords conventionally lay a basis for this kind of searching. As for geographic datasets, they originate from a wide variety of disciplines which implies rich diversity of keywords. Ideally, management of keywords for searching geographic data could be based on an extensive ontology covering the variety of disciplines, but in practice, creating this kind of ontology is not within easy reach. Therefore, a thesaurus of thematic keywords was collected for use in the Finnish national geospatial metadata service. It contains about 400 keywords in a hierarchical structure and extends over the thematic fields covered by the EU INSPIRE directive as well as the themes listed in the ISO 19115:2003 metadata standard for geographic information.
The design principles of the thesaurus aimed at covering the different field of geographic data equally, providing easiness for both authors and users of metadata, and benefiting from the existing thesauri of various disciplines. Difficulties of following these as such obvious principles are documented in the paper. The work started with an investigation of potential thesauri, then terms were selected for each thematic field iteratively with comments from an expert group. The experience gained in this project is formed into recommendations that cover both the selection of terms in a thesaurus and the design procedure.
The main theoretical question in the design process concentrated on the dependence of thematic terms on the discipline that is their origin. Because the software used for the metadata service did manage only hierarchical structures, this dependence added the challenge of design because of the requirement of extended simplicity. Another profound question in searching is the level of detail that the keywords should reach in order to ensure the most relevant search results. The keywords of data providers should meet with the keywords of potentials users of data. Search for relevant data fails in cases of both too detailed and too generalized keywords. In this design process differences in the proper level of details could be found between different fields of geographic data.