We first present several central and novel contributions to the field implemented in Snakemake. Second, we show how Snakemake comprehensively covers data analysis needs by introducing generic workflow design patterns that can serve as blueprints for composing any kind of analysis.
In the following, we describe how sustainability is achieved with Snakemake by following the central goals outlined above.
The key idea of Snakemake is that workflows are specified by decomposing them into steps, which are represented as
rules (Fig.
\ref{924627}). Each rule describes how to obtain a set of output files from a set of input files. This can happen via a shell command, a block of Python code, an external script (Python, R, or Julia), a Jupyter notebook (
https://jupyter.org), or a so-called tool wrapper (see Sec.
\ref{431950}). By the use of wildcards, rules can be generic. For example, see the rule
select_by_country
in Fig.
\ref{924627}a. It can be applied to generate any output file of the form
by-country/{country}.csv
, with
{country}
being a wildcard that can be replaced with any non-empty string. In shell commands, input, and output files, as well as additional parameters are directly accessible by enclosing the respective keywords in curly braces (in case of more than a single item in any of these, access can happen by name or index).