Lecture Notes in Computer Science Volume 5272, 2008, pp 208-220
Provenance and Annotation of Data and Processes
Allan J. MacKenzie-Graham, Arash Payan, Ivo D. Dinov, John D. Van Horn, Arthur W. Toga
Provenance, the description of the history of a set of data, has become important in the neurosciences with the proliferation of research consortia-related neuroimaging efforts. Knowledge about the origin, preprocessing, analysis and post hoc processing of neuroimaging volumes is essential for establishing data and results quality, the reproducibility of findings, and their scientific interpretation. Neuroimaging provenance also includes the specifics of the software routines, algorithmic parameters, and operating system settings that were employed in the analysis protocol. The LONI Pipeline (http://pipeline.loni.ucla.edu) is a Java-based workflow environment for the construction and execution of data processing streams. We have developed a provenance framework for describing the current and retrospective data state integrated with the LONI Pipeline workflow environment. Collection of provenance information under this framework alleviates much of the burden of documentation from the user while still providing a rich description of an image’s characteristics, as well as the description of the programs that interacted with that data. This combination of ease of use and highly descriptive meta-data will greatly facilitate the collection of provenance information from brain imaging workflows, encourage subsequent data and meta-data sharing, enhance peer-reviewed publication, and support multi-center collaboration.