The project topic covers various techniques used for processing of large volumes and/or large size of historical paper documents, their archival and visualization and extraction of important information. The techniques also include optical character recognition from paper scans, automated detection of regions of interest, information aggregation and classification, and knowledge repository building from extracted information. In special cases (Abraham Lincoln papers) we are also interested in problems associated with design and fuctionality of web-based interfaces (portals and virtual observatories) for educational and research purposes.

Projects description

Digging into Image Data

Digging into Image Data to Answer Authorship Related Questions (DID-ARQ) seeks to explore authorship studies of visual arts through computational image analyses.
In the past, authorship has been explored in terms of attributions, typically of either individual masterpieces or small collections of art from the same period, location, or school. To our knowledge, there have to date been no studies of image analyses targeting the problem of authorship applied to very large collections of images and evaluated in terms of accuracy over diverse datasets.

DID-ARQ investigates the accuracy and computational scalability of adaptive image analyses when they are applied to diverse collections of image data.

medieval manuscript  map of China  quilts

Examples of three datasets of images: fifteenth century manuscript, seventeenth and eighteenth-century maps, and quilts from the last two hundred years.

This effort will utilize three datasets of visual works -- 15th-century manuscripts, 17th and 18th-century maps, and 19th and 20th-century quilts to investigate what might be revealed about the authors and their artistic lineages by comparing manuscripts, maps, and quilts across four centuries.

Cyber Tools to Aid Understanding of the Medieval French Book Trade
Cyber Connoisseurship

The Art History, French and Medieval Studies programs at the University of Illinois are working together with NCSA to develop cyber tools for analyzing the visual imagery embedded in several early manuscripts of Jean Froissart's Chronicles, which have been successfully digitalized and mounted on the web and are available through Virtual Vellum (see also Virtual Vellum Overview). With the collaboration of a team at the University of Illinois and several European institutions, they plan to develop cyber tools for analyzing the visual imagery embedded in manuscripts.

im1 im2 im3 im4 im5 im6 im7 im8 im9

Bicentennial celebration of Abraham Lincoln's birth

With the upcoming bicentennial celebration of the birth of Abraham Lincoln in 2009, our work is motivated by delivering the information about Abraham Lincoln's life to scientific and educational communities. Many Lincoln documents have already been studied and made available to the public through books, monographs, and initiatives, some of which are available online in Library of Congress, Abraham Lincoln Presidential Library, The Papers of Abraham Lincoln and The Collected Works of Abraham Lincoln. The existing virtual spaces accessible via the Internet usually do not provide a comprehensive view of the fast growing amounts of digital information about Lincoln's life. Our objective is to integrate heterogeneous data sources in a virtual observatory and provide access to temporal, spatial and contextual dimensions of the underlying large volume of data.

The overall project to digitize, store, and make publicly available all Lincoln writings is a joint effort of multiple institutions, including NCSA, the Illinois Historic Preservation Agency, and the Abraham Lincoln Presidential Library and Museum.

As part of this project, the design of the virtual observatory has two major objectives. The scientific objective reflects our broader interest in manipulation and image processing of terabytes of data. The educational objective aims at making the documents accessible to the general public, students and scholars through the web-based interface. The advanced information delivery system supports data browsing, text query-based searching, geospatial data retrieval and visualization, and transcription services for transcribing image scans of handwriting to text.

Example of the multidimensional view - letter

Figure 1. Web-based interface: In this example a letter was sent from Fort Randall to president Abraham Lincoln on October 26, 1862. The bits of information about the document (metadata) namely the time, the location of a sender and the location of President Lincoln are known. The letter path is visualized in Google Maps, the document can be retrieved from the database and edited. Additionally, user can overlay one of the historical maps. The markers are positioned with high accuracy based on the latitude and longitude of historical sites.