Supporting exploration and collaboration in scientific workflow systems.
Luigi Marini, Rob Kooper, Peter Bajcsy and Jim Myers
AGU, Fall Meeting,
December 10-14, San Francisco, California (2007)
As the amount of observation data captured everyday increases, running scientific workflows
will soon become a fundamental step of scientific inquiry. Current scientific workflow
systems offer ways to link together data, software and computational resources, but often
accomplish this by requiring a deep understanding of the system with a steep learning curve.
Thus, there is a need to lower user adoption barriers for workflow systems and improve the
plug-and-play functionality of these systems.
We created a system that allows the user
to easily create and share workflows, data and algorithms. Our goal of lowering user adoption
barriers is to support discoveries and to provide means for conducting research more efficiently.
Current paradigms for workflow creation focus on the visual programming using a graph based metaphor.
This can be a powerful metaphor in the hands of expert users, but can become daunting when graphs
become large, the steps in the graph include engineering level steps such as loading and visualizing
data, and the users are not very familiar with all the possible tools available.
We present
a different method of workflow creation that co- exists with the standard graph based editors.
The method builds on exploratory interface using a macro- recording style, and focuses on
the data being analyzed during the step by step creation of the workflow.
Instead of storing data in system specific data structures, the use of more flexible open
standards that are platform independent would create systems that are easier to extend and that
provide a simple interface for external applications to query and analyze the data and metadata
produced.
We have explored and implemented a system that stores workflows and related metadata
using the Resource Description Framework (RDF) metadata model and that is build on top of
the Tupelo data and metadata archiving system. The scientific workflow system connects
to shared content repositories, where users can easily share data, workflows, algorithms
and annotations.
Examples of the above methodologies will be illustrated using a prototype
workflow solution called Cyberintegrator and a use case scenario being developed by
the Corpus Christi Bay WATERS Network test bed (a group of collaborating domain scientists
from Texas and Illinois) involving monitoring, predicting and understanding of the hypoxia
problem in Corpus Christi Bay.