Locating and Characterizing the Borders Between Ecoregions Using Multivariate Geographic Clustering

William W. Hargrove and Forrest M. Hoffman





Note: An abbreviated version of this document was published in Computers in Science and Engineering 1(4):18-25.

Ecologists have long used a specialized form of visualization based on massive data sets: the delineation and mapping of ecoregions (Bailey 1983, Omernik 1987). Ecoregions are areas within which there exist similar combinations of environmental characteristics. One set of familiar and useful ecoregions is the map of USDA Plant Hardiness Zones, which gardeners use to select plants and shrubs which are appropriate for landscaping within particular areas of the United States. Plant growing conditions inside each Hardiness Zone are more similar than growing conditions across two Zones. Like the Plant Hardiness Zones, ecoregion classifications are based on particular environmental conditions and designed for specific purposes, and no single set of ecoregions would be appropriate for all potential uses. The ecoregion concept is one of the most important in landscape ecology, both for management and understanding (Omernik and Bailey 1997, Omernik 1995).

Unfortunately, ecologists have struggled with exactly how and where to locate the dividing lines between ecoregions (Bailey 1983, 1996; Omernik and Bailey 1997). Historically, the process of regionalization (drawing borders to delineate ecoregions) has been subjectively performed by experts, who attempt to integrate and weigh all of the environmental characteristics using their expertise. The expert may not be able to elucidate exactly why or how he/she placed a border exactly at this spot. This subjectivity leads to frequent revisions (Bailey 1994, 1995, 1996, 1998) and disagreements over particular locations (CEC 1997), and hampers widespread acceptance and use. In fact, experts are unable to correctly identify actual ecoregion maps from synthetic maps simulated using a fractal technique (Hargrove et al. 1997).

Ecotone vs. Ecopause

Part of the problem is the variable nature of the borders between ecoregions. In some places, borders can be very sharp and distinct, and it is literally possible to stand with one foot in one region and the other in a clearly different one. Such unequivocal borders are easy to locate, and are termed ``ecotones'' by ecologists, since they represent sharp ``cuts.'' However, most situations are more like the M.C. Escher woodcut ``Sky and Water I'' in which black birds at the top of the image slowly transform into white fish at the bottom. Although we can all agree that the picture contains two distinctly different creatures, we will have difficulty exactly locating a line of demarcation between them. For these types of gradually-transitioning edges, we define a new term ``ecopause,'' indicative of the indistinct nature of these borders. Indeed, a border can begin at one geographic location as an ecotone, and then transform slowly along its length into an ecopause. Unfortunately, ecologists have had only simple lines with which to attempt to visualize these many types of borders. In this paper, we suggest an alternative way to depict these borders; one which also portrays the instantaneous sharpness at every point along the line.

Multivariate Geographic Clustering to Define Ecoregions

Locating borders between ecoregions is truly a multivariate decision process which must make reference to a number of large geographic data sets, one for each environmental condition which is to be considered. We have developed an objective technique combining multivariate statistics and a Geographic Information System (GIS) which objectively computes the placement of borders between ecoregions, given maps of all environmental conditions that one wishes to be considered. Rather than relying on expertise, our technique uses the standardized values of each environmental condition for each individual raster cell in the map as a set of coordinates which specify a position for that raster cell in an environmental data space having as many dimensions as the number of included environmental characteristics. Two raster cells from anywhere in the map having similar combinations of environmental characteristics will be located near each other in data space, and their nearness and relative positions will quantitatively reflect their environmental similarities.

After their disassembly from geographic space, the map cells are re-plotted in environmental data space like stars in a data universe. Because the density of these cells in data space is not uniform, we use an iterative classification procedure to group various nearby ``stars'' into clusters having similar combinations of environmental conditions. This procedure begins with the specification by the user of the desired number of ``galaxies'' or clusters into which the stars are to be grouped. All observations are examined sequentially to find the most widely separated set of stars which will provide this number of initial cluster ``seeds.'' Thus, the number of ecoregions which result from the process is under the user's control.

Each map cell is then compared against all cluster seeds, and the map cell is assigned membership to the cluster whose seed is closest to it in terms of Euclidean distance. After all map cells have been assigned, new cluster centroids are calculated to be the mean of each coordinate over all cells assigned membership to that cluster. Then the iterative assignment procedure repeats. Stars do not move in environmental data space; rather, the centroids of the cluster ``galaxies'' slowly slew until an equilibrium classification is obtained. When fewer than a specified number of map cells change cluster assignment in a particular iteration, the process converges and halts.

Figure 1 shows a visualization of 3000 clusters in a three-dimensional data space representing North America. In this case, the three dimensions are the first three principal component scores resulting from nine environmental characteristics (see further discussion below). Because showing individual map cells would obscure the view entirely, clusters are shown instead. In this visualization, cluster icons are sized and colored according to the number of member cells. Clusters with the largest membership tend to be centrally-located in data space, and cluster sizes follow a negative exponential distribution. Because the procedure generates clusters with nearly uniform within-cluster variance, the actual radius of all clusters in data space is nearly equal, despite membership.

Map cells with their final cluster assignments can then be re-assembled into their proper geographic positions, and the resultant ecoregion map can be color-coded by the cluster assignment. Because raster cells adjacent in the map are likely to have similar environmental values, ecoregion clusters are often geographically contiguous. However, because the geographic location is not used for clustering, clusters can be spatially disjoint, and two map cells with similar environments could be classified in the same ecoregion even though they are widely separated geographically. Two widely-spaced mountain tops, for example, could be classified in the same ecoregion cluster if their environments are similar enough.

We call this empirical process Multivariate Geographic Clustering, and have implemented it in a parallel algorithm coded in C using the Message Passing Interface (MPI). Our code is dynamically load-balancing, fault-tolerant, and performs both initial seed-finding and iterative cluster assignment in parallel. The clustering algorithm is inherently parallelizable, since individual nodes can independently classify subsets of cells, then combine results at the end of the iteration. We developed the Multivariate Geographic Clustering parallel algorithm and code on a highly heterogeneous Beowulf-class parallel machine constructed from surplus 486- and Pentium-based personal computers. Additional information on our 126-node ``Stone SouperComputer'' can be seen at http://www.esd.ornl.gov/facilities/beowulf and is described in Hoffman and Hargrove (1999).

We have performed a number of empirical regionalizations for the conterminous United States at 1 sq km resolution for up to nine environmental characteristics (Hargrove and Luxmoore 1997, 1998), and have divided the United States into as many as 7 thousand distinct ecoregions (Hargrove and Hoffman 1998). At this resolution, each of the nine national environmental condition maps is comprised of more than 7.8 million cells. This map, data, and ecoregion resolution surpasses that usually accomplished by ecoregion experts.

The example ecoregions included in this paper resulted from a Multivariate Geographic Clustering of the conterminous United States at a resolution of four square kilometers on nine particular environmental characteristics important to plant growth. The environmental characteristics considered for this particular regionalization include elevation, slope, bulk density of the soil, depth of mineral soil, depth to bedrock, mean annual temperature, mean annual precipitation, water-holding capacity of the soil, and mean annual solar insolation including cloud interception.

A principal component analysis (PCA) grouped soil density, soil depth, and bedrock depth into a first principal component encompassing soil factors. The second principal component loaded with temperature and precipitation, and inverse elevation and slope. The third principal component was formed from solar insolation and inverse soil water-holding capacity. These three principal components were retained as the axes for this environmental data space and formed the basis for this eco-regionalization. Figure 2 shows the map which results from segregating the nation into 50 distinct ecoregions based on these nine environmental conditions. The map contains about half a million cells, each with nine characteristics. Parallel Multivariate Geographic Clustering can efficiently handle much larger problems. Each randomly-assigned color represents a distinct ecoregion in this map.

Visualizing Ecoregion Similarity with RGB Color Coding

It is important to visually distinguish adjacent ecoregions with random colors, particularly to emphasize the location of the borders. However, ecologists may also desire some visual indication of the relative mix of conditions represented by bordering ecoregions. Since the final location of the cluster centroid is, by definition, the most centrally located point inside each cluster, the coordinates of the centroid provide a description of the average ecological conditions in this cluster ecoregion. Comparison of the centroid coordinates from two ecoregions quantifies the differences between the average environments found in each ecoregion.

If a PCA analysis has been used to condense a larger number of ``raw'' environmental variables into three orthogonal principal component axes in environmental data space, we can perform a one-to-one scalar mapping of the first, second, and third principal component scores to a red-green-blue (RGB) color triplet. In this way, the combination of the three coordinates for each cluster centroid can be used to specify a unique color for that ecoregion. Under this Similarity Color scheme, the color used for each ecoregion indicates the relative mix of each environmental factor. Comparison of adjacent ecoregions in such visualizations is simple; ecoregions containing similar environments are colored similarly.

Figure 3 shows the same national ecoregions as Figure 2, now colored using the Similarity Colors encoding scheme. With the original random colors, all intervening borders can be easily seen. However, with the new RGB Similarity Colors, the borders between some adjacent and similar ecoregions nearly disappear, and the visualized map becomes a gradient of slowly-changing colors. These Similarity Colors quantitatively reflect the mix of environmental conditions found at each point on the map.

In Figure 3, Factor 1, ``soil properties,'' is green, Factor 2, ``temp & precip,'' is blue, and Factor 3, ``solar & water-holding,'' is red. Black results from small but balanced values of all factors, and white results from large but equal values of all factors. Thus, white areas in Florida, Texas, and California's Central Valley reflect high solar insolation, low water-holding, high bulk density, deep soils and bedrock, high temperature and precipitation, low elevation, and gentle slopes.

Interestingly, such RGB-encoded Similarity Colors maps converge rapidly to show the same large regional trends in ecological relationships. If two eco-regionalizations based on the same environmental conditions are produced, but one is divided finely into many ecoregions, while the other is divided coarsely into relatively few, the Similiarity Colors versions which result from these two very different maps will be indistinguishable from each other. This convergence occurs despite the fact that the polygons underlying each map are completely different - only the RGB coding technique is the same. Thus, the choice of number of ecoregion divisions is relatively insensitive; beyond some minimum number of ecoregions, the same regional ecological patterns are revealed. Ecologists need only inspect such ecoregion visualizations to gain insight and understanding about regional environmental relationships.

Gauging ``Representativeness''

Fundamental to characterizing and visualizing the sharpness of borders is quantifying how representative a particular location is of its parent ecoregion. We saw in the above section that the final centroid of each cluster in environmental data space, since it represents the arithmetic average of all member cells, is the best single way to represent that cluster ecoregion. Map cells which are in the interior of the cluster, close to the mean centroid, are highly representative of this ecoregion, while map cells in the outer ``shell'' of the cluster are less representative. These outlying cells are the ones which might change cluster assignments if another iteration of the classifying algorithm were repeated.

Thus, we propose that the Euclidean distance from each cell to the centroid of the cluster to which it was ultimately assigned is an easily quantifiable measure of representativeness for that location. Because all cells have a centroid to which they have been assigned, such a representativeness value can be computed for all locations in the map. This metric takes into account the level of ecoregion division, since more ecoregions mean more (and closer) cluster centroids in environmental data space. Cells close to their centroids are more representative of their cluster ecoregions than cells far from their centroids.

Distance to Centroid as an Elevation Surface

If we map the distance from each cell to its cluster centroid back into geographic space, and depict these ``representativeness'' values as elevations, we could create a surface whose height inversely corresponds to the representativeness of the cell at that geographic location. Because we can calculate such a value for all cells, this representativeness surface will be complete, entire, and continuous across the whole map. This theoretical elevation surface reflects representativeness, and is different from the actual topographic elevations at these locations.

Hypothetical cluster ecoregions might appear as a series of depressions or craters in such an elevation surface. We might expect that the borders between adjacent ecoregions would trace along the tops of the rims of these craters (Figure 4). The deepest spots in the craters would correspond to cells at or near the cluster centroids in environmental data space; these are the most representative (i.e., lowest) geographic locations.

Edge Profile Characteristics

We can now examine cross-sections in this theoretical representativeness surface perpendicular to cluster edges to determine the instantaneous characteristics of the borders between adjacent ecoregions. Elevation profile cross-sections describe the character of the border between adjacent ecoregions. Figure 5 depicts two extreme possibilities for a hypothetical vertical cross-section between adjacent ecoregion craters, one red and one blue. In the left depiction, each ecoregion is steep-sided and ``U''-shaped, while on the right, cluster crater walls descend more gradually, making them ``V''-shaped. The border between the steep-sided ecoregions on the left is sharp, representing more of an ecotone, while the border on the right is softer and more gradual, representing an ecopause.

``Sidedness'' of Edges

Because edge properties are dependent on each adjacent cluster in this scheme, each side has distinct (and possibly different) properties. Although counter-intuitive at first, this surprising property is logical when we remember that it is the transition from the border to the heart of EACH side that we are independently characterizing. Thus, Figure 5 depicts hypothetical edge cross-sections of varying sharpness, including one with a mixed characterization (fuzzy/sharp).

Visualizing the Sharpness Characteristics of Edges

We propose to use contour lines of equal representativeness to visualize the sharpness characteristics of borders between ecoregions. As in Figure 6, closely-spaced representativeness contours reflect steep sides and therefore a sharp ecotone. On the other hand, widely spaced representativeness contours indicate gradually sloping crater walls, and a fuzzy, gradual ecopause. Each of the three hypothetical border types represented by plan views in Figure 6 can be paired with an appropriate profile cross-section shown in Figure 5. Contour lines as a visualization tool have the flexibility to represent mixed gradual/sharp borders, as well as borders whose characteristics change along their length.

Visualizing Real-World Borders

What do real-world borders look like when visualized with representativeness contour lines? Figure 7 shows a 3-dimensional visualization of the representativeness elevation surface for southwest Georgia, Alabama, and northern Florida. Each square in the mesh represents a single 4 sq km raster cell, and the elevation of that cell is obtained from the Euclidean distance from that cell's location in data space to the centroid of its cluster. Cluster membership is shown in Figure 7 as the (random) color of each cell. The representativeness topography is continuous and interpretable at this resolution.

Many geographic features are recognizable in this visualization. Atlanta, Macon, and Columbus can be seen as a discontinuous purple urban cluster. The kelly green Piedmont of central Georgia changes to the red Coastal Plain in southern Georgia. From west to east, four rivers (the Flint, Ocmulgee, Oconee, and Ogeechee) are seen as linear extensions of the green Piedmont ecoregion into the Coastal Plain. In southern Alabama, the red Coastal Plain and kelly green Piedmont ecoregion colors actually interdigitate, showing single cells of red within the green and vice versa. The light green southwestern Appalachians can be seen passing through northeast Georgia and entering eastern Alabama, and the olive drab southwestern tip of the Ridge-and-Valley ecoregion, forming a higher-elevation representativeness plateau, can be seen in northwestern Alabama.

Figure 8 shows the equal elevation contours draped onto the representativeness surface to visualize the sharpness of the ecoregion borders. The random orientation and meandering character of the contours near the red Coastal Plain and kelly green Piedmont ecoregions in southern Alabama clearly indicate that this border is an ecopause. On the other hand, the closely-spaced, parallel contour lines separating the kelly green Piedmont from the olive drab Ridge-and-Valley in northern Alabama represent this border as a sharp ecotone.

The same ecoregions are colored according to Similarity using the RGB-encoding scheme, and shown with border sharpness contours in Figure 9. Although the colors appear to simply reflect the elevation, they are actually derived from the coordinates of the centroid from each ecoregion. Abrupt color changes are accompanied by the numerous parallel contours of an ecotone, while subtle color changes are accompanied by the meandering contours of an ecopause. Over this limited extent, the colors correlate with the height of the representativeness surface; however, distant locations across the entire map having equal representativeness elevations could have substantially different environment colors. Once the interpretation of the sharpness contour lines is understood, the simple planar plan-view presentation with random ecoregion colors (Figure 10) adequately captures both the location and the characteristics of ecoregion borders. Close, regular, adjacent contours create thick black lines where borders are sharp ecotones.

The hallucinogenic planar view in Figure 11 shows random ecoregion colors and sharpness contours for southern California. San Francisco Bay can be seen at the upper left, and the San Joaquin Valley is represented as a purple ecoregion in the north and a gray ecoregion in the south. The jagged representativeness topography for southern California is shown as a mesh with random ecoregion colors in Figure 12. The Coastal Range appears to the west, and the Sierra Nevada mountains to the east, separated by the much flatter and more representative San Joaquin Valley in the middle. There is little representativeness difference, as evinced by the meandering sharpness contours, between the northern purple and southern gray Valley ecoregions (Figure 13). Similarity Colors (Figure 14) indicate that the western foothills of the Sierra Nevadas are similar to the Coastal Range with respect to these nine environmental characteristics, but become more distinctive further east. Mono Lake is visible as a distinctive flat yellow plateau at the upper right, near the Nevada border.

Similar visualizations were created for North and South Dakota of representativeness elevations, sharpness contours, similarity colors, and a planar view.

Even at the national scale, careful application of sharpness contours can reveal regional patterns in ecoregions. Particularly when combined with RGB Similarity Colors (Figure 15), 5 levels of national sharpness contours show that the Piedmont, the southern Coastal Plain, the Corn Belt Plains, and the Northwestern Great Plains share a relatively gentle representativeness topography. Although these areas are distinct, as indicated by the different Similarity Colors, transitions between them are ecopauses, as revealed by the lack of sharpness contours. The Appalachian mountains, along with the Rockies and the western U.S., are seen to contain many more sharp ecotone-type boundaries than the eastern U.S. This result is consistent with most peoples' intuition, experience, and expectation.

Characterizing ecoregion borders is important for more than just ecological understanding. Border movement at fuzzy edges may be the first detectable evidence of climate change. Characterizing borders will also facilitate comparisons among alternative eco-regionalizations; differences in the location of sharp edges are more important than different placements of fuzzy edges. The visualizations shown here also provide a means for inspecting the appropriateness of the geographic clustering. For example, the appearance of multiple low areas within a single cluster ecoregion may suggest that a finer level of division is desirable. On the other hand, borders passing through low areas may suggest that fewer divisions are needed.

We have outlined an objective and empirical technique which, based on a given set of environmental characteristics, can unambiguously locate, characterize, and visualize ecoregions and the borders that separate them. Coded with Similarity Colors, planar map views with sharpness contours, once understood, are visually rich in ecological information, and represent integrated visualizations of complex and massive environmental data sets.


References

Bailey, R.G. 1983. Delineation of ecosystem regions. Environmental Management 7:365-373.

Bailey, R.G., P.E. Avers, T. King, W.H. McNab, eds. 1994. Ecoregions and subregions of the United States (map). Washington, DC: U.S. Geological Survey. Scale 1: 7,500,000; colored. Accompanied by a supplementary table of map unit descriptions compiled and edited by McNab, W.H., and R.G. Bailey. Prepared for the U.S. Department of Agriculture, Forest Service.

Bailey, R.G. 1995. Description of the ecoregions of the United States. (2nd ed., 1st ed. 1980). Misc. Publ. No. 1391, Washington, D.C. U.S. Forest Service. 108 pp with separate map at 1:7,500,000.

Bailey, R.G. 1996. Ecosystem Geography. Springer-Verlag. 216 pp.

Bailey, R.G. 1998. Ecoregions map of North America: Explanatory Note. Misc. Publication Number 1548, U.S.D.A. Forest Service. 10 pgs with map.

Commission for Environmental Cooperation. 1997. Ecological regions of North America: toward a common perspective. Commission for Environmental Cooperation, Montreal, Quebec, Canada. 71 pgs. Map (scale 1:12,500,000).

Hargrove, W.W., and F.M. Hoffman. 1998. National Clustering. URL: http://www.esd.ornl.gov/projects/clustering/

Hargrove, W.W., and R.J. Luxmoore. 1997. A Spatial Clustering Technique for the Identification of Customizable Ecoregions. URL: http://www.esri.com/library/userconf/proc97/PROC97/TO250/PAP226/P226.HTM

Hargrove, W.W., and R.J. Luxmoore. 1998. A New High-Resolution National Map of Vegetation Ecoregions Produced Empirically Using Multivariate Spatial Clustering. URL: http://www.esri.com/library/userconf/proc98/PROCEED/TO350/PAP333/P333.HTM

Hargrove, W.W., P.M. Schwartz, and F.M. Hoffman. 1997. The Fractal Landscape Realizer. URL: http://www.esd.ornl.gov/projects/realizer/

Hoffman, F.M., W.W. Hargrove, and A.J. Schultz. 1997-1999. The Stone SouperComputer - ORNL's First Beowulf-Style Parallel Computer. URL: http://www.esd.ornl.gov/facilities/beowulf/

Hoffman, F.M., and W.W. Hargrove. March 1999. "Cluster Computing: Linux Taken to the Extreme." Linux Magazine, Vol. 1, No. 1, pp. 56-59.

Omernik, J.M., and R.G. Bailey. 1997. Distinguishing Between Watersheds and Ecoregions. AWRA Water Resources Bulletin 33(5).

Omernik, J.M. 1987. Ecoregions of the conterminous United States. Map (scale 1:7,500,000). Annals of the Association of American Geographers.

Omernik, J.M. 1995. Ecoregions: a spatial framework for environmental management. pp. 49-62 In: W.S. Davis and T.P. Simon, Biological Assessment and Criteria: Tools for Water Resource Planning and Decision Making. Lewis, Boca Raton.


For additional information contact:

William W. Hargrove
University of Tennessee
Oak Ridge National Laboratory*
GIS and Spatial Technologies Group
Computational Physics and Engineering Division
P.O. Box 2008, M.S. 6274
Oak Ridge, TN 37831-6274
423-241-2748 voice
423-241-3870 fax
hnw@fire.esd.ornl.gov
Forrest M. Hoffman
Oak Ridge National Laboratory*
Environmental Sciences Division
P.O. Box 2008, M.S. 6036
Oak Ridge TN 37831-6036
423-576-7680 voice
423-576-8543 fax
forrest@esd.ornl.gov
*Oak Ridge National Laboratory, managed by Lockheed Martin Energy Research Corp. for the U.S. Department of Energy under contract number DE-AC05-96OR22464.
"The submitted manuscript has been authored by a contractor of the U.S. Government under contract No. DE-AC05-96OR22464. Accordingly, the U.S. Government retains a nonexclusive, royalty-free license to publish or reproduce the published form of this contribution, or allow others to do so, for U.S. Government purposes."

William W. Hargrove (hnw@fire.esd.ornl.gov)
Last Modified: Wed Jun 30 19:21:01 EDT 1999