Hyperspectral image data mining for band selection in agricultural applications.

S.G. Bajwa, P. Bajcsy, P. Groves, L.F. Tian

Transactions of the American Society of Agricultural Engineers 47 p895-907 (2004).

Hyperspectral remote sensing produces large volumes of data, quite often requiring hundreds of megabytes to gigabytes of memory storage for a small geographical area for one-time data collection. Although the high spectral resolution of hyperspectral data is quite useful for capturing and discriminating subtle differences in geospatial characteristics of the target, it contains redundant information at the band level. The objective of this study was to identify those bands that contain the most information needed for characterizing a specific geospatial feature with minimal redundancy. Band selection is performed with both unsupervised and supervised approaches. Five methods (three unsupervised and two supervised) are proposed and compared to identify hyperspectral image bands to characterize soil electrical conductivity and canopy coverage in agricultural fields. The unsupervised approach includes information entropy measure and first and second derivatives along the spectral axis. The supervised approach selects hyperspectral bands based on supplemental ground truth data using principal component analysis (PCA) and artificial neural network (ANN) based models. Each hyperspectral image band was ranked using all five methods. Twenty best bands were selected by each method with the focus on soil and plant canopy characterization in precision agriculture. The results showed that each of these methods may be appropriate for different applications. The entropy measure and PCA were quite useful for selecting bands with the most information content, while derivative methods could be used for identifying absorption features. ANN measure was the most useful in selecting bands specific to a target characteristic with minimum information redundancy. The results also indicated that a combination of wavebands with different bandwidths will allow use of fewer than 20 bands used in this study to represent the information contained in the top 20 bands, thus reducing image data dimensionality and volume considerably.

Keywords: Band selection, Data mining, Hyperspectral, Precision agriculture, Remote sensing