About

Towards a Universal File Format Converter

Motivated by the large number of file formats within the 3D file domain, NCSA Polyglot was created to provide an extensible, scalable, and quantifiable means of converting between formats. The system is extensibile in terms of being able to easily incorporate new conversion software, scalable in being able to distribulte work load among parrallel machines, and quantifiable in having a built in framework for measuring information loss across conversions.

The approach commonly taken by software writers to handle format conversions is to directly support a number of formats by implementing loaders according to documented specifications. This is an arduous and error prone task as many formats have large complex specifications. It may also be a near impossible task in general as there are MANY file formats within each file domain and many of the specifications are closed. We instead take a different approach. It is a fact that vendors of proprietary formats will support their format within their own software. As this format was created by the vendor, this loader should also be the best implementation of that files' loader one could hope for. It is also generally the case that most software applications support importing and exporting to some subset of file formats within their domain. What this entails is that there exists a means of getting in and out of these closed formats.

On top of this we emphasize the benefits and necessity of code reuse. Re-using code in general not only saves programmers time from reinventing the proverbial wheel but provides robustness in terms of well tested/used code. On the other hand software vendors rarley find it in their interest to allow for the reuse of the code within their applications. Because of this there often is not a convenient API available for programmers to access this code. However, a program must be useful and it must be useful to human beings. Thus, there is always some form of access to underlying functionality. For modern applications this involves a graphical user interface.

We introduce the notion of imposed code reuse. We define this as the wrapping of 3rd party software, utilizing whatever interfaces the software vendors make availalable, to provide API like access to embedded code. NCSA Polyglot, our attempt towards a universal file format converter, uses imposed code reuse to perform individual conversions. Using the AutoHotKey (ahk) scripting langauge we wrap 3rd party applications containing any bit of conversion functionality. Applications that have an "open" operation, a "save" operation, an "import" operation, or any other term meaning the same thing can be utilized as a converter between formats. By conforming to a small set of conventions these scripted applications can be used to create an I/O-Graph, a data structure which stores a number of input/output operators. For file format conversion this graph, shown below, stores as vertices the union of all file formats supported by the 3rd party software being utilized. Edges are directed and represent a conversion between a source format and a target format by one of the applications. Paths between the vertices of the I/O-graph represent conversions that can be achieved by chaining conversions available within single applications. We use this graph created from the ahk scripts and the scripts themselves to automate conversions.

An I/O-Graph, with vertices representing a number of file formats and edges representing a conversion between a source and target format. The highlighted edges indicate a conversion path between the *.stp and *.lwo file format given the 3rd party applications represented within the graph.

NCSA Polyglot, needing to protect the users desktop from windows popping up automatically and the mouse being taken away by various automated GUI's, is a web based service. The required conversion software and the Polyglot server daemon that runs them sits on one or more parrallel machines devoted to file format conversions. Users can currently access the services of a Polyglot server by either: a web interface (shown below), a small command line program similar to ImageMagick, or from within Java applications through a provided API.

Left: The web interface to a Polyglot server. Users can drag and drop files to the top area, select the output format in the list, click "upload", and download the resulting files when they are available. Right: Given a "universal converter" one can consider things like a "universal viewer". In this web interface to a Polyglot server a user can again drag and drop files to the top area. However this time there is no selection of the output format. The output format is hardcoded to one that can be displayed in the browser. When the user presses the "upload" button the files are converted and displayed in the area below. In the shown example the files are 3D files and displayed using our included applet which displays files of the type *.obj. The interactive applet allows users to manipulate the resulting 3D objects.

Our locally running Polyglot server is accessible at http://polyglot1.ncsa.illinois.edu. The installation files to setup your own Polyglot server are available from our download page. Note, as NCSA Polyglot was originally motivated by the need to convert 3D file formats, and considering that most high end 3D applications are Windows based, currently Polyglot servers must be Windows based.


People, Publications, Presentations

Team members

  • Peter Bajcsy
    Research group ISDA, National Center for Supercomputing Applications, University of Illinois
  • Kenton McHenry
    ISDA, NCSA, University of Illinois
  • Rob Kooper
    ISDA, NCSA, University of Illinois

Publications and presentations

  • Kenton McHenry and Peter Bajcsy "3D+Time File Formats.", Technical Report NCSA-ISDA10-001, October 15, 2010. [pdf 215kB]
  • K. McHenry, R. Kooper and P. Bajcsy, "Towards a Universal, Quantifiable, and Scalable File Format Converter.", 5th International IEEE eScience conference (IEEE e-Science 2009), Oxford, UK, December 9 - 11, 2009 [abstract][pdf 580kB]
  • K. McHenry and P. Bajcsy, "Framework Converts Files of Any Format.", Society of Photo-Optical Instrumentation Engineers (SPIE) Newsroom, 2009.
  • K. McHenry and P. Bajcsy, "3D Data Analysis.", WVU/NETL/ERA Workshop on Digital Preservation of Complex Engineering Data, April 21-22, 2009, Morgantown, WV, USA, 2009.
  • K. McHenry and P. Bajcsy "An Overview of 3D Data Content, File Formats and Viewers.", Technical Report NCSA-ISDA08-002, October 31, 2008. [pdf 330kB]
For additional publications related to this research, please, visit the ERA research web site at
http://www.archives.gov/era/research/research-publications.html