Hybrid Classification Method for Multi-source POIs Based on Semantic Analysis
ISBN 978-85-88783-11-9
Authors
1Liu, J.; 2Luo, A.; 3Zhang, F.; 4Wang, Y.
1CHINESE ACADEMY OF SURVEYING AND MAPPING Email: liujp@casm.ac.cn
2CHINESE ACADEMY OF SURVEYING AND MAPPING Email: luoan@casm.ac.cn
3CHINESE ACADEMY OF SURVEYING AND MAPPING Email: zhangfh@casm.ac.cn
4CHINESE ACADEMY OF SURVEYING AND MAPPING Email: wangyong@casm.ac.cn
Abstract
With the increasing number of Internet users and mobile devices in the recent years, the amount of POI (point of interest) which is used as a special point location on the Web is growing at a rapid rate. It is always provided by many different web sites such as google map, bing map, baidu map and so on. To better exploit the POIs from different website, there are some problems of multi-sources data fusion need to overcome. The first one of these problems is classifying POIs automatically. In this paper, we present a classification method for multi-source POIs based on semantic analysis, which matches through three aspects information of POI as category tagging, name and attributes. 1) Category tag matching Category tag is a method of management for POIs by the data providers. It is artificial marked for its purpose or the way people use it, and can reflect what it is and how to use it accurately and reliably. Category is not just a short text for its function, or is one of categories in taxonomy which deļ¬nes the hierarchical relationships between categories. Category tag matching is the process of mapping the category in one taxonomy to a category in a specify taxonomy. Each taxonomy is not exactly the same, it makes the certain category in one taxonomy cannot match exactly with a category in a specify taxonomy. Sometime one category is mapping to a few categories in other taxonomy. 2) Name matching All POIs have their names which people use to find them or guide for tour and drive. Just looking at the name of POI, it is easy for people to know what it is or how to use it, because of different categories have different naming rules. For all kinds of naming rules, we adopt a hybrid method for name matching. (a)Keyword matching: Only for a few categories, they always contain some special key words which other categories do not contain.(b) Center-word matching: For the categories whose center word can identify the function of POI.(c)Regular expression matching: For the categories which can express by combinations of words. 3) Attribute matching After category and name matching, more than one category are maybe matched with the POI. In order to classify more accurately, we refine the relevant information with the category of POIs. Considering the different categories of POI with the different characteristics and properties, then attributes information are found and extracted. Attributes can be refined from the description information of POIs, and they describe more details about characteristics, functions and uses for POIs. This semantic method can classify the POI datasets fast and automatically and save a lot of manual labor. In order to prove the effectiveness of this method, a prototype system is designed and implemented, and it demonstrates that this proposed method yields high-quality results under realistic situation.
Keywords
multi-source POIs; hybrid classification; semantic analysis