Study on Rapid Parallel Algorithm based on MPI and OpenMPI for Multi-source POI Location Correction
ISBN 978-85-88783-11-9
Authors
1Wang, Y.; 2Yang, Y.; 3Zhang, F.; 4Luo, A.; 5Agen, Q.
1CHINESE ACADEMY OF SURVEYING AND MAPPING Email: wangyong@casm.ac.cn
2CHINESE ACADEMY OF SURVEYING AND MAPPING Email: yangyilyg@126.com
3CHINESE ACADEMY OF SURVEYING AND MAPPING Email: zhangfh@casm.ac.cn
4CHINESE ACADEMY OF SURVEYING AND MAPPING Email: luoan@casm.ac.cn
5CHINESE ACADEMY OF SURVEYING AND MAPPING Email: a.root@yeah.net
Abstract
With the development of the Internet technology, Internet POI information resources are increasingly abundant and current abundant POI data have been important data sources for the expansion and update of POI database. However, multi-source POI data owns inconsistent position information while the POI quantity of single data source in China is of million-order of magnitude. Current location correction methods mostly adopt serial algorithms and the efficiency of POI location correction with large data volume is in face of bottleneck, which makes it difficult to satisfy the requirements of POI rapid correction with large volume. To the best of our knowledge, however, few studies have investigated the potential of parallel computing in solving the large-scale POI location correction problem. This paper combines MPI with OpenMP and studies the parallelization idea of the serial algorithm and proposes the parallel framework of location correction algorithm to study the rapid parallelization of location correction model. Geographic range of POI point set can be divided into N*N small grids by specified geographic interval. The smaller the geographic interval is, the larger N is. Each small grid corresponds to one correction unit and N*N grid universal set constitutes correction unit set. In the correction unit set, the least square principle is utilized for location correction and the corrected results are made accuracy assessment. This paper combines MPI with OpenMP and designs double-layer parallel model of process-level and thread-level. This paper firstly distributes the tasks of each process according to the geographical range of multi-source POI. In each correction unit, the fundamental principles of the least square are adopted to resolve location correction coefficient through region splitting and evaluate the location correction accuracy. Experiment environment is Windows high-performance colony, which includes 4 nodes with two-core and four thread Intel Xeon CPU, 12GB internal storage, 256GB hard disk and Gigabit ethernet card. The software is Winodows 7 operating system, with Openmpi1.4.1, OpenMP 2.0 and MySQL database. The experiment data are two different POI data sets from different websites in Beijing area (39.26°N,115.25°E,41.03°N,117.30°E), respectively with data sizes of 183,200 and 174,500. Through sampling measurement, the error of various-source POI before correction is at kilometer level. This paper adopts different correction unit interval to split correction unit and further make location correction on various-source POI data. With 30 minutes as geographical interval, the mean square error in longitude direction is 0.88m and the mean square error in latitude direction is 0.01m, satisfying the requirements of the actual application on POI location mapping accuracy. The time consumed to location correction has decreased sharply,and the speedup has improved significantly.Comparing to the speedup of thread number 32,the speedup of thread number 64 hardly improved. Overall, location correction algorithm after the transformation of the parallel algorithm in this paper greatly decreases the correction time and achieves good speed-up ratio, which verifies the validity of the parallel algorithm in this paper. The experiment indicates that the parallel algorithm designed and realized in this paper can realize the rapid parallelization of POI location correction and the algorithm transformed by this method owns good stability and speed-up ratio. The follow-up studies shall further explore the applicability of this parallel framework in road network.