THE CREATION OF A MULTISCALE NATIONAL HISTORICAL GEOGRAPHIC
INFORMATION SYSTEM FOR THE UNITED STATES CENSUS
J. Schroeder, R.B. McMaster
University of Minnesota
mcmaster@umn.edu
The
recently completed National Historical Geographic Information System (NHGIS) at
the University of Minnesota has created a spatio-temporal database of census
boundary files and associated attribute data for the entire USA
(http://www.nhgis.org/). These spatial
and statistical files have been created at the census tract- and county-levels
for the period 1790 to 2000 (for counties) and 1910-2000 (for tracts). Based on the needs of the myriad users of
this database, including social scientists, educators, policy-makers, and
demographers, ongoing research has developed processes for the creation of
multiple-scale versions of the boundary files through a multi-step
generalization process. The primary
spatial data for the project comes from the Bureau of the Census’ TIGER files,
which were generated at a scale of approximately 1:100,000 from multiple
sources, including the United States Geological Survey’s digital line
graphs. These 2000 TIGER files were used
to generate boundaries for previous censuses through a process that involved
scanning paper maps and using other sources of historical census data.
The
paper provides an overview of the NHGIS generalization framework, discussing
the data model and principle algorithms used, as well as the unique challenges
in maintaining topology among overlapping historical census boundaries. The main processing is divided in two parts.
The first part eliminates small areas (islands, small tract parts, and slivers
caused by historical boundary changes) according to measures of area and
area/perimeter. The second part generalizes boundaries in four steps. The first step joins feature parts that touch
each other at only one node. The second step applies the Douglas-Peucker
algorithm using a low tolerance to remove what are called insignificant points,
or those that contribute little to the geographical character of the
boundary. The third step completes line
generalization using an altered version of the Visvalingam-Whyatt algorithm
designed to maintain boundary smoothness and prevent over-reduction of small
features. The fourth step eliminates “node wedges”—narrow spaces lying where
multiple feature edges intersect. A critical component within each of these
steps is the maintenance of correct topology, which requires additional
operations to prevent intersections among generalized boundaries.
The
entire process theoretically allows for the creation of a generalized database
at any scale. One can simply change an input scale parameter and all thresholds
used in the generalization process are adjusted accordingly. The paper provides examples of generalized
output at several scales, details of the entire generalization process, and
discusses difficulties with the process and future research.