Visualization and data mining tools applied to Algal biomass prediction in Illinois streams.
Large amounts of hydrologic, geographic, meteorological, water quality, soil type, land-use
and many other types of data are available for water scientists and practitioners. Those abundant and often
multidimensional datasets could be analyzed using sophisticated and complex modeling techniquesthat might
require powerful computers to handle the computation
Various data mining tools help us better understand the
data and methods, better interpret the results, and more accurately predict the future
values of hydrologic variables, and thus make better water planning and management
decisions.
The Image Spatial Data Analysis (ISDA) group at the National Center for Supercomputing
Applications (NCSA) has been working together with the Illinois State Water Surver (ISWS) on a set of
visualization and data mining tools. These are being developed for water resources research and applications.
The tools are applied to predic Algal biomass using nutrients and other
explanatory variables.
Several methods for extracting variables from remote sensing data, clustering variables, and modeling
relationships between variables with data-driven models, such as Naive Bayes or decision tree, were
explored with the observed nutrients, algal biomass and other data. Furthermore, in order to solve the algal
biomass prediction problem, several heterogeneous software tools had to be executed and linked together
with various data sets. Thus, we have also introduced a software process management technology for performing
algal biomass prediction with heterogeneous visualization and data mining software tools.
The problem of algal biomass prediction in Illinois streams lies in explaining the variability in
algal biomass measured as chlorophyl a, based on nutrients (total or dissolved nitrogen, and total or dissolved
phosphorus) and other variables (water velocity, canopy cover along the streambank, stream width/depth, etc.).
Algae are either the direct or indirect cause of most problems related to nutrient enrichment.
Figure 1. Elevation map of Illinois with rivers and water resources (right)
and its pseudocolor representation (left).
Snapshots of GeoLearn software environment developed in ISDA group.
Figure 2. Selected waterstations of Illinois; georeferenced raster (right) with tabulated
data of different parameters related to the water quality (temeperature, elevation, habitat etc.) (not shown).
Left: Algae predicted values at water station locations.
Figure 3. Overlay of the elevation map of Illinois and the Algae predicted values at water station locations.
Our study uses a dataset for the entire state of Illinois, consisting of numerous nutrients, chlorophyll a (green)
data and other variables. Although these long-term ambient datasets are incomplete and do not necessarily contain storm-event
data, they represent the best currently available datasets for testing the results of this study in Illinois.
Technical
Algal biomass prediction in Illinois streams
Software: Im2Learn,
GeoLearn, HDF, ARCGIFS, D2K, LIBSVM, KNN
Distributed data and computed resources: Local and Remote



The algal biomass prediction problem can be described as a sequence of processing steps to establish
data-driven models (relationships) between input variables and algal biomass growth, and to provide
computer-assisted interpretation of the models supported by visualization for water scientists
and practitioners. The flow of processing steps is illustrated above. The overarching goals of
the analysis are (a) to predict algal biomass from multiple measurements gathered using water gauges,
remote sensors and other instruments with unsupervised learning and supervised modeling techniques and
(b) to improve users understanding of algal biomass spatial and temporal variability.