Image To Learn (Im2Learn)

The motivation for developing Im2Learn (Image to Learn) comes from academic, government and industrial collaborations that involve development of new computer methods and solutions for understanding complex data sets. Images and other types of data generated by various instruments and sensors form complex and highly heterogeneous data sets, and pose challenges on knowledge extraction. In general, the driver for the Im2Learn suite of tools is to address the gap between complex multi-instrument raw data and knowledge relevant to any specific application. The objective of the Im2Learn suite of tools is to research and develop solutions to real life problems in the application areas of machine vision, precision farming, land use and land cover classification, map analysis, geo-spatial information systems (GIS), synthetic aperture radar (SAR) target and multi-spectral scene modeling, video surveillance, bio-informatics, microscopy and medical image processing, and advanced sensor environments. The main goal of the Im2Learn research and development is to automate information processing of repetitive, laborious and tedious analysis tasks and build user-friendly decision-making systems that operate in automated or semi-automated mode in a variety of applications. The development is based on theoretical foundations of image and video processing, computer vision, data fusion, statistical and spectral modeling,


CyberIntegrator is a highly interactive exploratory scientific process management environment to support earth observatories and to address the many needs of scientific processes. The current implementation of Cyber-Integrator enables users (1) to browse registries of data, software tools and computational resources, (2) to create meta-workflows by example (step by step execution), (3) to re-use and re-purpose meta-workflows, (4) to execute meta-workflows locally or remotely, (5) to incorporate heterogeneous code executors and tools, and link them transparently, (6) to provide recommendations about workflow completion, (7) to search for data, tools and resources in registries, and (8) to support processing of streaming data and large size, out-of-core, data.

  Geospatial Image To Learn (GeoLearn)

GeoLearn is designed to enable rapid processing of large size satellite remote sensing data available in HDF EOS format. It has been tested primarily with MODIS land-surface data products. Use and analysis of these datasets are at the heart of a variety of scientific investigations pertaining to the study of the interaction between land-surface and climate, and prediction of terrestrial hydrologic processes.

  Image Provenance To Learn (IP2Learn)

Image Provenance To Learn (IP2Learn) is a simulation framework that is designed for understanding preservation and reconstruction archival requirements for a class of decisions based on image inspection. IP2Learn allows users to analyze computational costs of information gathering as a function of information granularity and then assess the potential value of preserved information from decision process reconstructions.

  Spatial Pattern To Learn (SP2Learn)

SP2Learn presents a framework for accurate estimation of geospatial models from sparse field measurements using image processing and machine learning. The goal is to improve our understanding of the underlying physical phenomena and increase the accuracy of geospatial models. A typical process of building a geospatial model includes interpolation of sparse field measurements, application of existing physics-based models, incorporation of spatial constraints using image processing techniques, exploration of auxiliary raster measurements using machine learning, and optimization of all algorithmic parameters in supervised, as well as, in unsupervised manner. SP2Learn allows users to explore the accuracy improvements when several image de-noising techniques with a decision tree machine learning technique are employed, and multiple remote sensing and terrestrial raster measurements are used. For example, we provide test data to illustrate how to incorporate and mine slope, soil type and proximity to water bodies for predicting groundwater recharge and discharge (R/D) rate models.


Supporting conversion between formats is a daunting software engineering task. There exists many file formats within each data domain. On top of that many of these formats are closed meaning no specification is available to build the required file loader, many are proprietary and not standardized meaning they can change and even vanish over time, and even open formats can be a problem as many have huge specifications. The Polyglot conversion service tackles the problem of format conversion from the opposite end of the spectrum. Rather than attempting the nearly impossible task of writing loaders for every file format, it uses the notion of imposed code reuse to utilize 3rd party loaders locked away within vendor released software binaries. Motivated by the 3D file domain where vendors have created a new format for nearly every new software package and few available libraries exist, Polyglot wraps the software itself to create a usable interface by which to call embedded operations. The current release, based on the AutoHotKey scripting language, is: extendible to new software, offers simple scalability to multiple machines, contains a web interface and Java API, and offers information loss analysis capabilities for the 3D file format domain.

  Document To Learn (Doc2Learn)

Doc2Learn provides functionality to do side-by-side visual comparisons of documents to quickly explore their contents in terms of word frequency (text, integers and floating point numbers), image color histograms (frequency of colors in an image), and frequency of encoded vector graphics. It can also perform automated grouping of documents based on a similarity of all components in the documents. The pair-wise similarities can be displayed and interacted with to investigate different parameters for grouping documents. Once grouping parameters were selected, the software enables manual re-shuffling of documents among groups, as well as re-ordering the documents within a group. The order of documents within a group is automatically established based on file time stamps. Finally, several attributes of documents in each group are extracted and displayed according to their temporal order. The attributes are evaluated using a few rules in order to perform integrity checks of a set of related documents in a group.

  File To Learn (File2Learn)

File2Learn is a prototype of an information gathering and exploratory framework for discovering file relationships. Many contemporary digital files are replicated during dissemination, updated on regular basis, or partially re-written. In addition, there are many cases when information from multiple files is used for creating a new file, or information from one file is split across many files. Thus, relationships among digital files become very complex if the information contained is tracked during the process of design, creation, dissemination and modification. Furthermore, it is quite common that the information about file relationships has been lost by the time digital files are scheduled for archiving. Our objective has been to design and develop an exploratory framework that would assist archivists in gathering information about potential links between files and in visual exploration supporting discoveries of file relationships.

  Conversion Software Registry (CSR)

CSR is a search tool to find software or software path capable of converting from one format to another. User can search through already inserted software packages and their input and output formats, can add and edit new software and formats. Part of the CSR database is also list of File formats based on the extension, Multipurpose Internet Mail Extensions (MIME) and PRONOM Unique Identifier, a digital file locator. Search options include Conversion, Software, Extension, MIME and PUID. Visualization part includes I/O-Graph to check which applications can be used to convert between a selected source and target file format.