We first present several central and novel contributions to the field implemented in Snakemake. Second, we show how Snakemake comprehensively covers data analysis needs by introducing generic workflow design patterns that can serve as blueprints for composing any kind of analysis.

Results

In the following, we describe how sustainability is achieved with Snakemake by following the central goals outlined above.

Automation

The key idea of Snakemake is that workflows are specified by decomposing them into steps, which are represented as rules (Fig. \ref{924627}). Each rule describes how to obtain a set of output files from a set of input files. This can happen via a shell command, a block of Python code, an external script (Python, R, or Julia), a Jupyter notebook (https://jupyter.org), or a so-called tool wrapper (see Sec. \ref{431950}). By the use of wildcards, rules can be generic. For example, see the rule select_by_country in Fig. \ref{924627}a. It can be applied to generate any output file of the form by-country/{country}.csv, with {country} being a wildcard that can be replaced with any non-empty string. In shell commands, input, and output files, as well as additional parameters are directly accessible by enclosing the respective keywords in curly braces (in case of more than a single item in any of these, access can happen by name or index).