Exploiting Online Gazetteer for Fully Automatic Extraction of Cartographic Symbols
ISBN 978-85-88783-11-9
Authors
1Chiang, Y.; 2Leyk, S.
1SPATIAL SCIENCES INSTITUTE,UNIVERSITY OF SOUTHERN CALIFORNIA Email: yaoyic@usc.edu
2DEPARTMENT OF GEOGRAPHY,UNIVERSITY OF COLORADO, BOULDER Email: stefan.leyk@colorado.edu
Abstract
State-of-the-art graphics recognition technologies for cartographic symbol extraction from scanned maps rely on a user labeling process to generate a set of shape and color descriptors for detecting the feature of interest [1–4]. In previous work, we developed an interactive approach that uses manually collected road and non-road (e.g., wetland) samples to extract road vector data and remove noise from historical USGS maps [1]. Our previous approach reduced 38% of the overall processing time with accurate results (compared to a manual digitization) but it still required 50 minutes (including sample labeling and result curating) to process a map of 2283 × 2608 pixels. While this previous approach demonstrated the ability to produce accurate results, the manual sample labeling would be required for processing each individual map sheet as paper maps are printed archival documents and often suffer from bleaching, blurring, and false coloring [4]. The significance of these image quality issues can vary from one map area to another requiring additional sampling, and the collected samples may not be directly applicable to another map. In this work, we exploit the fact that geographic information for the same area found in different data sources is not independent, to automatically generate training samples and enable fully automatic cartographic symbol recognition. We demonstrate this approach on the recognition of hotel symbols in a map using a gazetteer as the “dependent” knowledge source. Hotel-related information (presence and locations) found in a map and a gazetteer may not be exactly the same but they should have some overlap assuming the data are close in time. Given a scanned map covering Baghdad, Iraq (current edition) the task at hand is to find all hotel locations in the map without user intervention (i.e., training the algorithms). First, the system queries GeoNames (www.geonames.org) using the map coordinates and keyword “hotel”. The query results contain two hotel locations (overlapping information between the gazetteer and map): Baghdad Hotel (33.31867, 44.41516) and Palestine Hotel (33.31539, 44.41882). Second, around each of these two locations, the system crops a sub-area in the map assuming it contains a hotel symbol. Third, the system computes feature descriptors from the cropped areas. Fourth, the system scans through the entire map to find areas with similar descriptors (matching), and extracts those areas as hotel symbols. In our prior work this descriptor matching process relied on manually selected samples of hotel symbols [4]. Our system extracted 13 hotel locations from the map based on the hotel information from GeoNames. Out of the 13 extracted hotels, 12 are correct (precision 92.3%). There is a total of 17 hotels in the map, and our approach missed 5 hotels (recall 70.58%). In our previous work on the same task and test data, higher precision and recall (100% and 88.23%, respectively) were only achieved with manually provided hotel samples in the map [3]. In conclusion, the described system represents a fully automatic system for cartographic symbol recognition using ancillary geographic knowledge to guide the feature sampling and extraction process. This ability to process maps without user intervention is necessary to exploit the full richness of large volume digital historical map archives. References: [1] Chiang, Y.-Y., Leyk, S. and Knoblock, C. A. (2013). Efficient and robust graphics recognition from historical maps. GREC LNCS, 7423, p. 25-35. [2] Chiang, Y.-Y, Chioh, P. and Moghaddam, S. (2014). A Training-by-Example Approach for Symbol Spotting from Raster Maps. In Proceedings of the 8th GIScience. [3] Frischknecht, S. and Kanani, E. (1998). Automatic interpretation of scanned topographic maps: A raster-based approach. GREC LNCS 1389, p. 207-220. [4] Khotanzad, A. and Zink, E. (2003). Contour line and geographic feature extraction from USGS color topographical paper maps. PAMI 25(1):18-31.