About

Decision Support Using Data Mining Methods and Geographic Information.

Peter Bajcsy, Peter Groves, Sunayana Saha and Tyler J. Alumbaugh

Technical Report NCSA-ALG03-0002, February 2003

We present a decision support system using data mining methods and geographic information. The geographic information consists of heterogeneous, raster and vector, data types. While raster data types represent grid-based information collected by camera sensors, e.g., satellite images, vector data types are used for representing boundary information, for example, man-made or naturally defined regions, or point information, for instance, building locations, customer's permanent locations or measurement locations. The occurrence of heterogeneous geographic data types and the information associated with raster and vector data pose challenges to data analyses supporting decision makers in environmental preservation and development planning domains.

In this work, we focus on the data analysis problem related to forming geographic regions as aggregations of basic boundaries under a set of decision support constraints. This problem is defined as a search for the best partition of any geographical area that is (a) based on raster or point information, (b) formed by aggregations of known boundaries, (c) constrained by spatial locations of know boundaries and (d) minimizing an error metric.

We present our overall approach and describe the proposed optimization processes to the sub-problems, such as, (1) data representation of heterogeneous data types, (2) feature selection and extraction from heterogeneous data, (3) aggregation of boundaries into geographic partitions based on feature similarity, (4) error evaluation of multiple geographic partitions and (5) visualization of heterogeneous data within a geographical context.

The proposed system is illustrated with elevation and forest label raster data, US Census Bureau boundary data and FBI crime reports for making decisions about police force deployment.

Keywords: Mining geospatial data, decision support, geospatial clustering.