Containers for Chemistry Codes

The use of software containers has revolutionized how services are deployed on the internet, and science is beginning  to reap some of these rewards. At their core, containers such as Docker\cite{merkel2014}, Singularity\cite{Kurtzer2017}, and Shifter\cite{Gerhardt_2017}, are self-contained Linux-based binary environments that contain everything from Linux system calls through to the full call stack of software libraries. This enables the development of a complete build environment from a known base image, through to packages that should be installed, software repositories to clone, build, and install.
The software images have an entry point that can be called from the host operating system, and a flexible set of capabilities to mount file systems, get data in and out, etc. The use of a Dockerfile enables a full description of the base image, build steps, and configuration. The binary images can be uploaded to DockerHub and similar services, and downloaded using standard tools developed by the community. The container specification developed for the project offers a standard method of invoking a code, with input directives delivered in a simple JSON format, geometry using Chemical JSON input, and output translated to a simple JSON output format or Chemical JSON. All relevant files are in the "docker" folder in the deployment repository \cite{openchemistrymongochemdeploy}, with a folder for each of the containers.
Docker was the first widely adopted container standard, and it acts as a base for the images used in this project. It provides network isolation, which is not necessary for most computational chemistry tasks or HPC tasks in general. The Singularity project was founded to address this issue, concentrating more on the execution of binary programs without the need for privileged system access often not available on HPC clusters/supercomputers. Shifter offers a solution developed at NERSC for executing containers within their supercomputing environment. If Singularity were available on all environments that would probably be the only container format used, but at the present time this is not the case and so multiple container types are supported depending upon deployment requirements. The Docker images are used to generate the Singularity and Shifter containers, so that it is only necessary to maintain a single image for each code.
The containers developed to support the project go beyond providing a useful delivery mechanism for executable code. They reduce the barriers to collaboration, and enable binary containers to be used as reference implementations that can easily move between host systems. Private container registries can host images containing proprietary software, and automated builds can enable sharing codes that have complex dependencies with build systems that can hamper seamless testing of changes. As supercomputers, HPC and cloud environments evolve the utility of containers will only make these advantages clearer to the wider community.