This document is a review of manuscript CiSESI-2019-01-0012 submitted to Computing in Science & Engineering:
Provenance tracking in the LHCb software (Ana Trisovic, Chris R. Jones, Ben Couturier, and Marco Clemencic).
Keywords: cisemag, reproducible research, provenance, review

Content Summary

The authors argue that the best way to foster reproducibility is to integrate it within existing scientific software that is already in use. This technique makes using reproducibility tools seamless and straightforward. They have demonstrated their solution by integrating provenance tracking in the official analysis software used at LCHb high-energy experiment at CERN.
Keywords -  reproducible research, provenance.

Contribution

A proposed reproducibility, provenance tracking solution built into the LCHb analysis software. The stored provenance allows understanding how a file was produced and provides sufficient information to entirely reproduce the dataset, eliminating the need for the original input code or even documentation.
The paper is readable but requires some effort to digest. It covers some background material, skips over more fundamental content, and uses imprecise/inappropriate language at times.
References - References are sufficient and appropriate

Overview

Embedding provenance in a software system is a prevalent practice, but doing it for existing and widely used scientific software is uncommon. Thus, the contribution and differentiation, via related work, with previous publications, need to be highlighted.

Detailed Review

Introduction/Abstract/Related Work - I think the reader would benefit from a concise abstract highlighting the thesis statement and the contributions of the work. I like the one from https://arxiv.org/abs/1910.02863. Why am I not reviewing this document? The introduction starts with a list like review of the related work and ends with a thesis statement and the authors' proposed solution.
Section "The LHCb software" -  This section provides an overview of the LCHb GAUDI software framework.
Notes:
Section "Implementation of the service" - Describes the organization and methods of the metadata service.
Notes:
Section "Using the Provenance tracking service" - The section presents four use-cases, a code snippet for using MetaDataSvc in the Python configuration file, and two ways of examining the info file.
Notes:
Use-Cases
  1. latest version. How does MetaDataSvc help in reproducibility?
  2. minor tweaks to configuration. How does MetaDataSvc help in reproducibility?
  3. multiple analysist in one filesystem. How does MetaDataSvc help in reproducibility?
  4. version bug. This use-case seems tied to use-case 1. How does MetaDataSvc help in reproducibility?
Section "Conclusion" - Reiterate contributions.
Notes:

Questions

  1. How easy was this to implement in a service-based architecture/framework?
  2. Do you think this technique could be employed in other architectures as easily?
  3. The work was done in 2015 and incorporated into the GAUDI repository in 2017. Are there any end-user results? Is it widely used? Do reesearchers commonly user the info files to address concerns presented in your four use-cases?