About

Understanding Computational Requirements of
Preservation and Reconstruction

1. Data Integration and Information Gathering about Decision Processes Using Geospatial Electronic Records

Our work addresses the tradeoffs of electronic information preservation in terms of file format, data volumes and computational requirements. We have evaluated storage and retrieval efficiency of boundary data representations for LLS, TIGER and DLG data structures.

2. Simulation Environment for Understanding Computational Requirements of Preservation and Reconstruction

Ip2learn simulation framework

We focused on evaluating the information value, data volumes and computational requirements during decision making processes when preservation is our main objective.

The problem is stated as information gathering about decision processes using geospatial electronic records and described in more details below.

The government makes a large number of high-confidence decisions using geospatial electronic records. Decision makers might process maps and photographs called raster data, vector data that represent linear features like county boundaries or streams, and statistics in tabular form to arrive to a decision that affects the lives of many citizens. The problem is to document, preserve, and reconstruct the processes later, often years after the initial decision has been made.

While any government decision process is complicated on its own, tracking the analysis of geospatial electronic records supporting government decisions adds another layer of complexity. We must understand the cost of information preservation and the value of preserved information. A team from the National Center for Supercomputing Applications works with the National Archives and Records Administration to provide software tool (Image Provenance to Learn - IP2Learn) for:

  • simulating complicated high-confidence decision scenarios,
  • preserving the gathered information in temporally sustainable data containers, and
  • reconstructing high-assurance decision making processes.

Current progress

Using IP2Learn we have been conducting trade-off studies related to encryption, compression, storage file format, information gathering mechanisms and meta-data organization. We have also expanded the simlation framework by semi-automated generation of reports documenting the decission processes.


People, Publications, Presentations

Sang-Chul Lee
Rob Kooper
Peter Bajcsy

Research group ISDA, National Center for Supercomputing Applications, UIUC

  • Peter Bajcsy and Sang-Chul Lee, "Understanding Challenges in Preserving and Reconstructing Computer-Assisted Medical Decision Processes.", Workshop on Machine Learning in Biomedicine and Bioinformatics (MLBB07) of the 2007 International Conference on Machine Learning and Applications (ICMLA'07) December 13-15, 2007, Cincinnati, Ohio
  • P. Bajcsy, "Data Processing and Analysis.", Chapter IV (pp. 258-378) of the book Hydroinformatics: Data Integrative Approaches in Computation, Analysis, and Modeling, eds. P. Kumar, J. Alameda, P. Bajcsy, M. Folk and M. Markus, vol. 2, p534, CRC Press LLC 2006. [book cover jpg] [content pdf 26kB]
  • D. Clutter and P. Bajcsy, "Tradeoff Studies about Storage and Retrieval Efficiency of Boundary Data Representations for LLS, TIGER and DLG DataStructures.", Proceedings of the 2005 Symposium on Document Image Understanding Technology, College Park, Maryland, November 2-4, 2005. [abstract] [pdf 346kB]

This research was supported by a National Archive and Records Administration (NARA) supplement to NSF PACI cooperative agreement CA SCI-9619019. We also acknowledge NCSA/UIUC.