A.M. Olteanu

Cogit  Laboratory, IGN France


Currently, there is a multiplicity of geographical information to describe the same reality. This multiplicity comes from the increasing number of geographical data, because data capture becomes easier, and because there are increasingly needs for geographical data and updates. Data is represented at different scales, they are means to use in various applications and they also come from different acquisition process. Nevertheless, we realise that there is independence between databases that represent the same reality and this affects both data users and producers. Integration is seemed to be the issue to that problem. In order to integrate databases, redundancy and inconsistency between data should be identified. Many steps are required to finalise the databases integration and one of them is automatic data matching.

Our goal is to develop an automatic and generic matching algorithm that takes into account the imperfection in both geographical data and data specification. Thus, the purpose of this paper is to model explicitly the imperfection through a mathematical theory and to use it for data matching.

Firstly, we study the taxonomies of imperfection in both geographical information and Artificial Intelligence (AI) field. The variety of taxonomies of imperfection is large. There are many concepts that are used and there is no standard definition of these terms so that conflicts may appear between their definitions. We adopted the taxonomy usually used in AI that employs the concepts of imprecision, uncertainty and incompleteness.

Secondly, we focus on how to model and compute imperfection. Many probabilistic and non-probabilistic theories exist in literature and it appears that no single theory will satisfy all applications. After briefly reviewing these theories, we present our approach based on Evidence Theory. This theory is presented as a couple {Bel(p), Pl(p)}, where Bel(p) represents belief, and Pl (p) represents plausibility. These functions are computed from the mass of belief m(p) which is calculated for each source of information. The main difficulty using Evidence Theory is knowledge modelling. Thus, in order to initialise a belief structure we use a probabilistic approach, i.e. the mass of belief are modelled by a Gaussian function. The values of the standard deviation and the average of the Gaussian function are calculated using a supervised learning algorithm.

Finally, we present a matching algorithm based on Evidence Theory and evaluation using two geographical datasets that contains punctual geographical data representing the relief.