Variable Relevance Assignment using Multiple Machine Learning Methods.

Wei-Wen Feng

M.S. dissertation, Department of Computer Science, University of Illinois at Urbana-Champaign, 2006
Peter Bajcsy, Advisor

With the advance in remote sensing, various machine learning techniques could be applied to study variable relationships. Although prediction models obtained by using machine learning techniques are suitable for predictions, they do not explicitly provide means for determining input-output variable relevance. The relevance information is often of interest to scientists since relationships among variables are unknown.

In this thesis, we investigated the issue of relevance assignment for multiple machine learning models applied to remote sensing variables in the context of terrestrial hydrology.

The relevance is defined as the influence of an input variable with respect to predicting the output result. We follow the classical conceptual definition of relevance, and introduce a methodology for assigning relevance using various machine learning methods. The learning methods we use include Regression Tree, Support Vector Machine, and K-Nearest Neighbor.

We derive the relevance computation scheme for each learning method, and propose a method for fusing relevance assignment results from multiple learning techniques by averaging and voting mechanism. All methods are evaluated in terms of relevance accuracy estimation with synthetic and measured data. The main contribution of this thesis is a methodology for relevance assignment for multiple learning methods based on local regression, and the fusion methods better robustness.