The Conversion Software Registry.

Michal Ondrejcek, Kenton McHenry, and Peter Bajcsy

National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign

Microsoft Research eScience Workshop 2010
Berkeley, CA, October 11–13

We have designed a web-based Conversion Software Registry (CSR) for collecting information about software that are capable of file format conversions. The work is motivated by a community need for finding file format conversions inaccessible via current search engines and by the specific need to support systems that could actually perform conversions, such as the NCSA Polyglot.

In addition, the value of CSR is in complementing the existing file format registries such as the Unified Digital Formats Registry (UDFR before GDFR) and PRONOM, and introducing software quality information obtained by content-based comparisons of files before and after conversions. The contribution of this work is in the CSR data model design that includes file format extension based conversion, as well as software scripts, software quality measures, and test file specific information for evaluating software quality.

We have populated the CSR with the help of the National Archives and Records Administration (NARA) staff. The Conversion Software Registry provides multiple search services. As of May 28, 2010, CSR has been populated with 183,142 conversions, 544 software packages, 1316 file format extensions associated with 273 MIME types, and 154 PRONOM identifications.