EHASL — Environmental Health Advanced Systems Laboratory
Using GIS for Environmental Health Research
Fundamentals of GIS data
GIS data sets consist of both spatial information and attribute information. The spatial information is in the form of maps or grids, which are linked to the attribute information stored in a database.
Spatial data
There are two fundamental spatial data models for GIS spatial data. A spatial data model describes the way in which the space of the "real world" is transformed into the digital format of the GIS. The two most common spatial data models used in GIS are raster and vector.
RASTER — The raster data structure is a systematic tessellation of space. The standard raster is a grid cell array. In this data model every cell has a value, and the location is implicit in the structure. Examples include Digital Elevation Models (DEMs) and Landsat remotely-sensed images. In remotely-sensed imagery, each cell is referred to as a pixel, a contraction of "picture element." The areal extent each cell or pixel represents is referred to as the resolution, which is one of the scalar issues important in developing GIS projects. Common resolutions of remotely sensed data are Landsat Thematic Mapper data at 30 meter resolution (an areal extent of 30 x 30 meters or 900 square meters), and NOAA AVHRR data which are available at a 1 km resolution (an areal extent of 1 x 1 km or one square kilometer). Obviously these different resolutions of data have different utilities and applications.
VECTOR — The vector data structure uses a continuous coordinate space containing data elements of points, lines and polygons. GIS vector layers are often referred to as coverages. Thus a coverage may consist of points (e.g. hospital locations), lines (e.g. road networks) or polygons (e.g. census block group boundaries). Vector coverages may be derived from scanned or digitized maps, which themselves are produced at a certain cartographic scale. For example, USGS 7.5 minute quad maps are produced at a scale of 1:24,000. This means that for each 1 unit on the map, there are 24,000 corresponding units on the ground. Associated with a specific cartographic scale are a set of production guidelines for quality and accuracy. It is important to be aware of these accuracy issues when using GIS data from different original scales, since it can influence the uncertainty inherent in any analyses performed using the data.
Also important are the metadata, i.e. the "data about data" which are associated with a GIS coverage. This gives information on how the data were produced or derived.
Attribute data
Attribute data are the informational data which are linked to the spatial elements in a GIS data set, i.e. the database information. For raster data, the attribute data are linked to each cell or pixel. For vector data, the attributes are linked to either the points, lines or polygons, as appropriate for the coverage type. Most GIS systems store the attribute data in some sort of relational data base, which is simply a set of tables containing fields and the listing of the records or occurrences. For example, a point coverage of hospital locations may have an associated database with fields for Hospital Name, Street Address, Owner, Number of Beds, etc. Using the georeferencing capabilities of the GIS a field for Latitude/Longitude could be created and those coordinates calculated and inserted into the database.
Getting Data into a GIS
Attribute data are available in different formats, so data may need to be manipulated in order to bring it into the GIS database. Data in an ASCII format are relatively straightforward to bring in. Other data in a spreadsheet format (Quattro, Excel) can be translated into either an ASCII format or a dBase format for importing. dBase files can be directly translated into INFO format, which is the database used by the Arc/Info software. Other common database formats are Oracle and Informix. SAS data files are less-easily imported. Usually we translate from SAS to dBase and then import. Examples of attribute data include the US Census Bureau STF files, which are in a dBase format and are available either on CD Rom or via the World Wide Web. Web sites such as the EPA or CIESIN have other sorts of environmental and demographic attribute data.
Spatial data are also available in several formats, including digital, paper and mylar maps. They may be in a generic digital format which needs to be translated into the specific format for your GIS, or coverages may be available in an export format which can be brought into the GIS directly. Common export formats include GIS software packages such as Arc/Info, MapInfo, Atlas GIS, Intergraph, and GRASS. Other options include scanning a map using a scanner connected to the GIS, or digitizing, using a digitizing tablet to set up a correspondence between the paper map on the digitizing tablet and the digital file in the computer. If possible, you would prefer to avoid digitizing! Check with WWW sites or researchers in your field to find out if a coverage of appropriate scale and extent is already available in digital format for your study.
Scaling issues
It is important that the data sets you are using and combining in a GIS are at appropriate and compatible scales. Use the metadata associated with a data set to help you determine this. The cartographic scale may be included (e.g. the 1:24,000 scale USGS quad data), or the cell or pixel resolution. Also to be considered is the geographic scale, or extent of a data set. This refers to the area which a data set encompasses, e.g. the state of Colorado or within the city limits of Denver. Many physical data sets are available in data tiles, which are systematic divisions of a region of interest. The USGS tiles the conterminous 48 states into 1:250,000 scale tiles, each of which encompass an areal extent of 2 degrees longitude by 1 degree latitude. For example there is landuse/landcover data available at this scale. Beware Murphy's Law of Maps, which states that an area of interest tends to lie at the intersection between at least 2 and usually 4 maps!
What is a good GIS software to use?
One answer to that question is "what do you have access to." Other considerations include your workstation platform (PC or UNIX). If you will be sharing your data sets with other researchers, you may want to consider whether or not there is a conversion between their data types and yours. An additional consideration is to know what the currently popular GIS packages are. While the popularity of a GIS package is not necessarily an indication of its quality, the fact that it has a lot of users out there increases the chances that you will be able to obtain useful data sets and transfer data with other researchers. To this point, one of the more popular GIS packages currently in use is called Arc/Info, which is distributed by a company called ESRI. ESRI also makes a GIS called ArcView, which is a lot more user-friendly than Arc/Info, and can also use Arc/Info data files. Both of these packages run in both the P/C and Unix environments. In addition, there are MANY data sites on the WWW which contain Arc/Info data, which are distributed in the export format as files with names of the format <covername>.e00 — Watch for these types of files on the Web.
Other P/C GIS packages include MapInfo, Atlas GIS and IDRISI. ESRI recently bought out AtlasGIS, so there is increasing compatibility between these data formats. IDRISI was developed as more of a raster-based GIS, but it also has vector capabilities. MapInfo and Atlas GIS are more vector-based. All of these companies offer student versions and prices.
Image processing is a specialized application in GIS which allows manipulation and display of remotely-sensed imagery. A company called ERDAS produces a software called IMAGINE which is a highly functional (and very pricey) image processing software. A company called TNT Mips makes a P/C-based image processing software which is either free or relatively inexpensive for a student version. IDRISI (mentioned above) does image processing, although there may be a filesize limit.
NOTE: the above does not imply any endorsement of the products mentioned, only some opinions based on my experience as part of the GIS community.
Where do I get GIS data for my Environmental Health project?
There are several avenues for obtaining GIS data, both spatial and attribute. Some of the more commonly used data sets are obtained from the US Census Bureau. These data sets are available both on CD Rom and on the World Wide Web. Other sources of data include government agencies at the federal, state or local level. Some states have well-established WWW sites that contain datasets in the Arc/Info export (*.e00) format. Other data transfers are arranged by disk or tape. One of the least desirable methods for incorporating data is having to generate it yourself, either by digitizing or data entry. Sometimes, however, this can't be avoided.
In addition to the avenues mentioned above, there are many data vendors that will massage data sets and produce custom data in the format you require, often for a hefty fee. Sometimes, however, purchasing such data may be the most cost-efficient method of getting a data set, compared to work and time involved in doing it oneself.
Difficulties? | Copyright © 2002-2004 EHASL | Disclaimer | Equal Opportunity | Apply to CSU | Last modified: 28 June 2004


