Supporting exploration and collaboration in scientific workflow systems.

Luigi Marini, Rob Kooper, Peter Bajcsy and Jim Myers

AGU, Fall Meeting, December 10-14, San Francisco, California (2007)

As the amount of observation data captured everyday increases, running scientific workflows will soon become a fundamental step of scientific inquiry. Current scientific workflow systems offer ways to link together data, software and computational resources, but often accomplish this by requiring a deep understanding of the system with a steep learning curve. Thus, there is a need to lower user adoption barriers for workflow systems and improve the plug-and-play functionality of these systems.

We created a system that allows the user to easily create and share workflows, data and algorithms. Our goal of lowering user adoption barriers is to support discoveries and to provide means for conducting research more efficiently. Current paradigms for workflow creation focus on the visual programming using a graph based metaphor. This can be a powerful metaphor in the hands of expert users, but can become daunting when graphs become large, the steps in the graph include engineering level steps such as loading and visualizing data, and the users are not very familiar with all the possible tools available.

We present a different method of workflow creation that co- exists with the standard graph based editors. The method builds on exploratory interface using a macro- recording style, and focuses on the data being analyzed during the step by step creation of the workflow. Instead of storing data in system specific data structures, the use of more flexible open standards that are platform independent would create systems that are easier to extend and that provide a simple interface for external applications to query and analyze the data and metadata produced.

We have explored and implemented a system that stores workflows and related metadata using the Resource Description Framework (RDF) metadata model and that is build on top of the Tupelo data and metadata archiving system. The scientific workflow system connects to shared content repositories, where users can easily share data, workflows, algorithms and annotations.

Examples of the above methodologies will be illustrated using a prototype workflow solution called Cyberintegrator and a use case scenario being developed by the Corpus Christi Bay WATERS Network test bed (a group of collaborating domain scientists from Texas and Illinois) involving monitoring, predicting and understanding of the hypoxia problem in Corpus Christi Bay.