Visualization and data mining tools applied to Algal biomass prediction in Illinois streams.

Large amounts of hydrologic, geographic, meteorological, water quality, soil type, land-use and many other types of data are available for water scientists and practitioners. Those abundant and often multidimensional datasets could be analyzed using sophisticated and complex modeling techniquesthat might require powerful computers to handle the computation

Various data mining tools help us better understand the data and methods, better interpret the results, and more accurately predict the future values of hydrologic variables, and thus make better water planning and management decisions.

The Image Spatial Data Analysis (ISDA) group at the National Center for Supercomputing Applications (NCSA) has been working together with the Illinois State Water Surver (ISWS) on a set of visualization and data mining tools. These are being developed for water resources research and applications.

The tools are applied to predict Algal biomass using nutrients and other explanatory variables.

Several methods for extracting variables from remote sensing data, clustering variables, and modeling relationships between variables with data-driven models, such as Naive Bayes or decision tree, were explored with the observed nutrients, algal biomass and other data. Furthermore, in order to solve the algal biomass prediction problem, several heterogeneous software tools had to be executed and linked together with various data sets. Thus, we have also introduced a software process management technology for performing algal biomass prediction with heterogeneous visualization and data mining software tools.

The problem of algal biomass prediction in Illinois streams lies in explaining the variability in algal biomass measured as chlorophyl a, based on nutrients (total or dissolved nitrogen, and total or dissolved phosphorus) and other variables (water velocity, canopy cover along the streambank, stream width/depth, etc.). Algae are either the direct or indirect cause of most problems related to nutrient enrichment.

Map of Illinois BW Map of Illinois Pseudocolor

Figure 1. Elevation map of Illinois with rivers and water resources (right) and its pseudocolor representation (left). Snapshots of GeoLearn software environment developed in ISDA group.

Selected waterstations Algae biomass - results

Figure 2. Selected waterstations of Illinois; georeferenced raster (right) with tabulated data of different parameters related to the water quality (temeperature, elevation, habitat etc.) (not shown). Left: Algae predicted values at water station locations.

Overlay of the elevation map of Illinois and the Algae predicted values

Figure 3. Overlay of the elevation map of Illinois and the Algae predicted values at water station locations.

Our study uses a dataset for the entire state of Illinois, consisting of numerous nutrients, chlorophyll a (green) data and other variables. Although these long-term ambient datasets are incomplete and do not necessarily contain storm-event data, they represent the best currently available datasets for testing the results of this study in Illinois.


Algal biomass prediction in Illinois streams
Software: Im2Learn, GeoLearn, HDF, ARCGIFS, D2K, LIBSVM, KNN
Distributed data and computed resources: Local and Remote

Flow of GeoLearn software 1

Flow of GeoLearn software 2

Flow of GeoLearn software 3

The algal biomass prediction problem can be described as a sequence of processing steps to establish data-driven models (relationships) between input variables and algal biomass growth, and to provide computer-assisted interpretation of the models supported by visualization for water scientists and practitioners. The flow of processing steps is illustrated above. The overarching goals of the analysis are (a) to predict algal biomass from multiple measurements gathered using water gauges, remote sensors and other instruments with unsupervised learning and supervised modeling techniques and (b) to improve users understanding of algal biomass spatial and temporal variability.

  • Peter Bajcsy
    Research group ISDA, National Center for Supercomputing Applications, UIUC
  • Momcilo Markus
    Illinois State Water Survey

  • Rob Kooper
    ISDA, National Center for Supercomputing Applications, UIUC
  • Luigi Marini
    ISDA, National Center for Supercomputing Applications, UIUC
  • David Clutter
    ISDA, National Center for Supercomputing Applications, UIUC
  • Qi Li
    ISDA, Computer Science Department, UIUC

We acknowledge support of the NCSA Faculty Fellow Program of this work.