Automated geocoding of addresses from health information systems in Rio de Janeiro City
ISBN 978-85-88783-11-9
Authors
1Mota, L.; 2Lemos, M.C.F.; 3Saraceni, V.; 4Quadros, E.M.S.; 5Porto, B.P.A.; 6Nascimento, E.W.; 7Galvão, I.B.; 8Rego, J.S.
1RIO DE JANEIRO CITY HEALTH SECRETARIAT Email: ludolf@ymail.com
2RIO DE JANEIRO CITY HEALTH SECRETARIAT Email: crislemos97@gmail.com
3RIO DE JANEIRO CITY HEALTH SECRETARIAT Email: valsaraceni@gmail.com
4RIO DE JANEIRO CITY HEALTH SECRETARIAT Email: evanelza@gmail.com
5RIO DE JANEIRO CITY HEALTH SECRETARIAT Email: biancapap@gmail.com
6RIO DE JANEIRO CITY HEALTH SECRETARIAT Email: elciow@globo.com
7RIO DE JANEIRO CITY HEALTH SECRETARIAT Email: izagalvao@gmail.com
8RIO DE JANEIRO CITY HEALTH SECRETARIAT Email: jessica.srego@gmail.com
Abstract
The geocoding process consists in turning an address entered in a database into a more qualified information of its geographical position. This process aggregates value to the territorial information, allowing one to perform spatial analysis in many areas, including health. Cartographic bases containing the addresses to be searched are needed for the process of geocoding to occur. Sometimes, these bases are not complete in their entirety, due to problems in the addresses or for log intervals between updates of the bases, making it hard to have a full geocoding of a given database. During the routine geocoding process of health information systems’ databases pertaining to the Rio de Janeiro City Health Secretariat (SMS-RJ), we would find a huge number of addresses not found at the end of the process, using the official cartographic base of Rio de Janeiro City (RJC). On the one hand, when evaluating the thematic map generated, we could see that the silent areas mainly corresponded to new development areas or slums (“favelas”). On the other hand, when looking at the health databases, we found that a high proportion of cases of various health-related events would fall in those communities. Therefore, our maps would not reflect the health situation in its entirety, restricting the value of thematic maps to solve health issues. Given those findings, we aimed to adopt other cartographic bases that would allow us to increase the number of addresses found. One of these is the National Registry Address for Statistics (CNEFE), which presents a list of grouped addresses by census tracts. A census tract is the least territorial unit of grouped addresses, which would be interesting to the geocoding process. We then aimed to develop a tool to deal with spatial data, also able to automatically georeference point events in the territory from given addresses, using different cartographic bases to achieve a greater coverage. The Automated Geocoding Tool is a server/cliente system built over the object-relational database management system (ORDBMS) PostgreSQL 9.3, with an extension to spatial data PostGIS 2.1, written in CSharp 4.5 (Microsoft dotNet), using the object oriented paradigm and object pattern as a model of programming a software, which would locate addresses automatically and quickly, by means of three cartographic bases (RJC Official base, Google and CNEFE) in a complementary way, from an address spreadsheet (.csv) provided by the user. The tool is able to return the geographic coordinates, the cartographic base that was used and the level of quality of each address’s geocoding. Moreover, mechanisms to improve inconsistencies in the written addresses were introduced, making the system capable of identifying errors in mistaken neighborhoods, typos in the name of streets or abbreviations. The implementation of such a tool increased the performance of geocoding process up to 1,000% when compared to the old manual process, turning it feasible to geocode huge health databases in a short period of time and allowing the finding of 99% of the addresses. As to the accuracy of the data, according to some tests at RJC, the final base map that mixes the census tracts with the address tables (CNEFE) showed positional satisfactory results for RJC, with a mean difference of 128 meters compared to the official base of the city, and the Google base showed a lower difference, around 82 meters. In summary, the Automated Geocoding Tool proved to be of great help in managing the geocoding process of huge health databases, improving the proportion of addresses found, with a better result in terms of spatial analysis of health events, and also improved the quality of addresses with errors in the data enter process prior to the geocoding one.
Keywords
geocoding; automated tool; Rio de Janeiro