- The use of the Chemical JSON format seems essential for the inner workings of the platform described in the manuscript. It is my impression, however, that the format is not as widely known as it should be. Are there standard examples in your repositories of how to write a Chemical JSON file from commonly used compiled languages, such as C++, C, and Fortran? Are there any quantum chemistry codes that can already emit their output in this format? Or is there an intermediate Python layer that translates from, e.g. a checkpoint file, to Chemical JSON?
There is JavaScript/TypeScript in the web client code, C++ in AvogadroLibs, and wrapped C++ from AvogadroLibs capable of going from checkpoint files to Chemical JSON. I think it is fair to say it is essential for the inner workings of the platform, but the intent is to offer that in addition to other formats for import/export of data. The Chemical JSON GitHub repository (https://github.com/openchemistry/chemicaljson) is mentioned when discussing examples. - Is there a formal standardization process for the Chemical JSON format in place? Who is participating? Could you describe the workflow used in the definition of the open standard?
There is nothing formal, there is a repository referenced in the paper. It is a working format developed to support several projects that can move quite quickly. The authors (Hanwell and de Jong) helped organize an initial workshop, and started discussions with MolSSI shortly after MolSSI was founded to spur standardization.
- How widespread is the adoption of Chemical JSON so far? Could it be merged with the QCSchema efforts of the MolSSI?
It is already in a number of codebases, ultimately it could be merged but in efforts to standardize it became clear that the velocity of QCSchema was too slow to accommodate projects with a shorter timeframe for development. The primary effort is to ensure QCSchema/similar support everything in Chemical JSON. We have added text to make this approach clearer - thank you for the questions. Please see the final paragraph of "Handling Data and Metadata" for some further detail elicited from this line of questioning.
- Are there any limitations to the format? I could think that storing basis set and MO coefficients information for very large molecules would make it rather impractical. Is this the case? If yes, how do you plan to solve this problem?
Absolutely, as discussed in the early part of "Handing Data and Metadata", and a concluding remark in the new paragraph making it clear our belief is that binary formats are essential for large calculations. JSON offers valuable room to explore the space before mapping to binary formats that share similar structures such as MessagePack or the more traditional HDF5.
- What about QM/MM simulations? How would the format need to be extended, if at all?
These are not addressed at this stage, they would of course be useful additions to support in the platform without doubt. For QM/MM one of the format extensions will be to label atoms as quantum or classical, and force field description. Some of these are currently being looked at in the QCSchema.
- Are there any plans/thoughts to support output from non-GTO-based codes? For example, numerical basis sets, plane waves, multiwavelets.
Not at this time.
- Are there guides available on how to deploy the platform on local HPC infrastructure?
This is the biggest gap right now, we have guides for local single machine developer deployments, AWS deployments with cloud clusters, and a SPIN-based deployment using NERSC at LBNL. This should be possible, but the team has not explored it focusing primarily on local development and the NERSC/AWS deployments. As with many things, this could certainly be accomplished given further development time.
- The MolSSI QCArchive initiative has an overlap of functionalities with the platform described in this paper, or so it seems to me. Would you clarify the relationship between your platform and QCArchive? What are the use cases for which your platform is specifically designed? Could the platform benefit from integration with QCArchive? Would QCArchive benefit from integration with your platform?
MolSSI and QCArchive arrived after we began working on this project, and there are some distinct differences. During early discussions it was clear that we focused on individual calculations, and retain more data so that electronic structure can be visualized for example. We also enable search based on name, InChI, SMILES etc which is (or at least was) difficult in QCArchive. They focus on large parameter sweeps, and generating inputs for machine learning/MD potentials.
- I think the manuscript would benefit from a short description of the code development workflow used by the developers.
Good point, added some description to the introduction in the final paragraph.
- The authors should put more emphasis on the fact that quantum chemical program packages in the backend are accessed through Docker/Singularity/Shifter images. Containers make it possible to share the code used to generate computational results without violating licenses. Even without the (undue, in my opinion) barrier imposed by licenses, some authors find unpleasant any obligation to share their research code. The use of containers removes a significant barrier not only for reproducibility, but also for collaboration. This is, to me, an extremely compelling feature of the platform and I think it deserves to be highlighted more in the manuscript.
Thank you, we agree on their importance, and the section on "Containers for Chemistry Codes" discusses the various containers, and why more than one kind is needed at this time. I added two paragraph to highlight some of these points at the end of the introduction section, and thank you for your suggestions and highlighting these points. We wholeheartedly agree.