THE CREATION OF A MULTISCALE NATIONAL HISTORICAL GEOGRAPHIC INFORMATION SYSTEM FOR THE UNITED STATES CENSUS

THE CREATION OF A MULTISCALE NATIONAL HISTORICAL GEOGRAPHIC INFORMATION SYSTEM FOR THE UNITED STATES CENSUS

J. Schroeder, R.B. McMaster

University of Minnesota

mcmaster@umn.edu

The recently completed National Historical Geographic Information System (NHGIS) at the University of Minnesota has created a spatio-temporal database of census boundary files and associated attribute data for the entire USA (http://www.nhgis.org/). These spatial and statistical files have been created at the census tract- and county-levels for the period 1790 to 2000 (for counties) and 1910-2000 (for tracts). Based on the needs of the myriad users of this database, including social scientists, educators, policy-makers, and demographers, ongoing research has developed processes for the creation of multiple-scale versions of the boundary files through a multi-step generalization process. The primary spatial data for the project comes from the Bureau of the Census’ TIGER files, which were generated at a scale of approximately 1:100,000 from multiple sources, including the United States Geological Survey’s digital line graphs. These 2000 TIGER files were used to generate boundaries for previous censuses through a process that involved scanning paper maps and using other sources of historical census data.

The paper provides an overview of the NHGIS generalization framework, discussing the data model and principle algorithms used, as well as the unique challenges in maintaining topology among overlapping historical census boundaries. The main processing is divided in two parts. The first part eliminates small areas (islands, small tract parts, and slivers caused by historical boundary changes) according to measures of area and area/perimeter. The second part generalizes boundaries in four steps. The first step joins feature parts that touch each other at only one node. The second step applies the Douglas-Peucker algorithm using a low tolerance to remove what are called insignificant points, or those that contribute little to the geographical character of the boundary. The third step completes line generalization using an altered version of the Visvalingam-Whyatt algorithm designed to maintain boundary smoothness and prevent over-reduction of small features. The fourth step eliminates “node wedges”—narrow spaces lying where multiple feature edges intersect. A critical component within each of these steps is the maintenance of correct topology, which requires additional operations to prevent intersections among generalized boundaries.

The entire process theoretically allows for the creation of a generalized database at any scale. One can simply change an input scale parameter and all thresholds used in the generalization process are adjusted accordingly. The paper provides examples of generalized output at several scales, details of the entire generalization process, and discusses difficulties with the process and future research.