EHASL high resolution mapping

 

 

 

CSU Wordmark

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

EHASL — Environmental Health Advanced Systems Laboratory

 

Using GIS for Environmental Health Research

Using data from the US Census Bureau

There are two main categories of census data incorporated into GIS projects, which correspond to the types of information discussed in the section on GIS fundamentals: spatial data and attribute data. These data types are often blurred together and referred to as "census data." It is important in GIS that the distinction be made between the spatial data maps containing census designation boundaries, and the attribute data of fields and records containing the demographic information which may be linked to the appropriate census spatial units.

Spatial data

Census geography is a hierarchical system descending from the coarsest unit of resolution, the entire nation, down through regions, states, counties, census tract/BNAs, block groups and blocks. Thus, the finest unit of resolution at which census information is available is the census block level, although some types of information are suppressed at this level for reasons of confidentiality. Census blocks may actually correspond to city blocks, however this is not always true. In rural areas census blocks may include many square miles and have boundaries which are not streets. Census block groups are combinations of census blocks within a census tract/BNA. Census tracts are areas designed to be relatively homogeneous with respect to population characteristics within metropolitan areas, and they average 4,000 persons. In rural areas and counties where census tracts have not been established, the Block Numbering Area (BNA) is used as an equivalent level of census geography to the tract. Thus the tract/BNA is composed of component block groups, which are themselves composed of component blocks. The numbering system for these units reflects the hierarchy. For example all of the blocks numbered 301 through 399 are part of Block Group 3 within a particular census tract.

To create the spatial boundaries for sets of census units, the TIGER file system has been developed at the Census Bureau. TIGER stands for Topologically Integrated Geographic Encoding and Referencing, which means that the components which make up the TIGER line files contain topographic information about which census units are to the left and right of each particular line. Using such information, GIS packages can generate polygon coverages of census blocks, block groups, tracts or other census units. Many TIGER line segments also contain address range information. This means that for each line in the GIS file, the associated database file contains information on the street name and the address range (e.g. 503 through 580 Elm Street) contained on that line segment. This allows the use of the TIGER line files to perform address matching against a list of research study subjects. It should be noted that this is often not a perfect science because the TIGER files may be incomplete, and typos and other discrepancies between the GIS database and a listing of study participants may cause match rates to be rather low, often in the 50-70% range for first tries. Some GIS address matching programs incorporate fuzzy logic, which allows the operator to judge near matches and assign them to the appropriate location. The software will use interpolation algorithms to assign a point location along a line for an address — e.g. 523 Elm Street will be located about one-fourth of the way up the line segment discussed above.

To recap — census geography defines units for which data are collected. In environmental health research, units are often blocks, block groups or tracts. Census TIGER files are a set of line data which contain topological information as well as address range information. TIGER files can be incorporated into GIS as both line coverages and as polygon coverages, where the polygons correspond to a certain level of census geography such as block groups. In both cases, there is a corresponding data base attached to the spatial data elements containing attribute information of interest to the research project, as well as the identifying information for the census units. For example, when a census block group coverage is generated from the TIGER files, one of the database fields (items) contains a unique identification number for each block group. Using the corresponding identification numbers from the census attribute files (the STF files, discussed below), a linkage is created by joining the GIS database associated with the census polygon to the STF attribute database, which also has the same identification number as one of its database fields. The GIS command for this operation in Arc/Info is:

JOINITEM<in_info_file><join_info_file><out_info_file><relate_item><start_item>{Linear|Ordered|Link}

The <relate_item> is the common database field on which the two databases are linked.

Attribute data

Census attribute data are the demographic data associated with the census geography units. The traditional data were published in hardcopy format and bound as books in the library. The data are now available on tape, microfiche and CD ROM, as well as via the WWW. As a sort of anachronism, the data sets are termed STF files, where STF stands for Summary Tape File. This terminology is used even when the data are delivered on CD ROM or via other formats.

The different STF files contain different data sets, and at different levels of census geography. STF files contain either the 100-percent data, which are the responses to questions asked of all the census participants, or the sample data, which contains responses to questions asked of only a portion of the census participants (approximately 1 in 6) and then statistically extrapolated. Obviously the sample data contains more data fields to accommodate the additional questions. Two of the more commonly used STF files are STF1B and STF3A. STF1B contains block level data of the 100-percent count data. STF3A contains sample data, but only to the block group resolution. Other STF files contain data by congressional district and ZIP codes. In addition, the STF files contain data from census geography units above the finest resolution in their hierarchy, for example STF3A also has data at the tract, place, and county levels.

To link STF attribute data with the GIS coverages of the census geographic units it needs to be extracted from the STF files. This is usually done using the CDROMs of the STF files, and extracting data using either the US Census Bureau EXTRACT program or other programs written in the dBase programming language. It is advisable to consult the data dictionary for the STF files to determine which data variables are desired.

 

Difficulties? | Copyright © 2002-2004 EHASL | Disclaimer | Equal Opportunity | Apply to CSU | Last modified: 28 June 2004