: Class Isodata

Class Isodata

The class Isodata implements a tool for classification of two-dimensional multivariate (multi-band) images using an extended K-means algorithm known as Isodata clustering.

Description: The Isodata algorithm computes (NumClusIn +/- NumClusDelta) clusters. The clusters are created by iterative assignments of n-dimensional points. The assignment is based on the minimum Euclidean distance of a point from all current cluster centroids. After each iteration, the cluster centroids are updated to the centroid of all currently assigned points. The initial set of centroids called seeds are obtained prior to the clustering run.

The following input parameters are present in the IsodataDialog:
NumClusIn is the number of desired clusters.
StDev is the standard deviation threshold for splitting a cluster.
MaxIterations is the maximum number of iterations before the algorithm is terminated.
MinSampPerCluster is the minimum number of samples per cluster in order to consider the cluster to be valid. If a cluster contains fewer than MinSampPerCluster points then it will be discarded and the points will be reassigned.
Lump is the minimum Euclidean distance between two valid clusters. If the distance is smaller than Lump then the cluster will be merged.
MoveCentroidsThresh is the threshold value for terminating the algorithm. If none of the cluster centroids has moved by more than MoveCentroidsThresh then the algorithm is terminated.
SubSampRow and SubSampCol are the two sub sampling parameters for speeding up the computation. First, the original image is sub-sampled by taking every SubSampRow row and every SubSampCol column. Second, the algorithm is executed on the sub-sampled image. Third, the final assignment is established using the centroids obtained from the sub-sampled version.

Setup: After all the input parameters were set, a user selects a method for finding seeds. There are currently four methods for finding seeds. First method "First N Points" takes the first NumClusIn points of image as the set of seeds. Second method, "Separated by 2*StDev", searches the current image for NumClusIn seeds that are separated by at least a 2*StDev Euclidean distance. The third method, "Random", selects NumClusIn random points from the current image such that no two seeds are identical. The last method, "Load From File", loads a text file with the seeds. In this case, the value of NumClusIn is updated based on the number of seeds loaded from a file. By clicking "ShowSeeds", the seeds will be displayed in the text area labeled as Results. The button "SaveSeeds" allows the user to run multiple images with the same set of seeds.
If the speed of the algorithm is a major concern then the values of SubSampRow and SubSampCol should be used. These values provide a way for trading the accuracy of the results with the speed of computation. Another way to reduce the execution time is to lower the value of MaxIterations.

Run: To run the algorithm click on the "Cluster" or "ClusterMask" buttons. ClusterMask requires loading a mask file of the same size as the original image to be processed. The mask file should be in a tif file format and its valid pixel should have intensity equal to (byte)-1. The mask can be created by using the Threshold tool.
The button "Stats" computes basic statistics of the final clusters, such as,
- number of points inside of a cluster (Size),
- average Euclidean distance for each cluster (Average Dist.),
- standard deviation vector for each cluster (StDev Vector),
- maximum standard deviation vector (StDevVecMax) with components that are equal to a maximum standard deviation per each dimension of StDev Vector,
- mutual centroids distance for all pairs of cluster centroids.
This set of statistics is shown in the text area labeled as Results after the button "ShowStats" is clicked.

Results:
"ShowCentroids" shows the final centroids in the text area labeled as Results.
"SaveCentroids" saves the final centroids into a text file.
"ShowLabels" creates a new frame with the label image. The choice list "EnhanceLabels" provides some methods for better visualization. The choice "None" leaves the label image intact with the label values ranging from 1 to number of found clusters. Another selection, "Gray Hist. Eq.", enhances the label image in the grayscale domain by maximizing its contrast. Similarly, the choice "RGB Hist. Eq." maximizes the label image contrast in the RGB space. The last choice "Centroid Color" replaces the labels with the corresponding centroids. This particular visualization form is perceptually intuitive for comparing label images with original images.
"SaveLabels" saves the final label image to a hard drive in the image format determined based on the setting in the "EnhanceLabels" choice list.

Release notes: The following parameters cannot be set from the IsodataDialog but might be added to the IsodataDialog in future.
NumClusDelta is the allowed variation of NumClusIn expressed in the percentage of the NumClusIn value. The default value is equal 0.25.
MaxPairLump is the maximum number of cluster pairs that will be merged if their mutual distance is smaller than the parameter Lump. The default value is set to 0.02 * NumClusIn.
MaskVal is the mask value of a valid pixel. The default value is (byte)-1. The current implementation supports a grayscale mask loaded from a tif file. The mask can be easily created by using the Threshold tool.