Optimization of the protocol
As with any metabarcoding project, the first important step is to design
the primers carefully to amplify the entire target community with
minimum bias, and we used a large database of known gene sequences to
achieve this. Another consideration that is shared with other approaches
is the choice of polymerase for PCR. For the samples studied here, with
abundant template DNA, the proofreading enzyme was clearly superior in
performance, although more costly. On the other hand, this enzyme may
provide less robust amplification when the template is weak, as we have
observed in another project aimed at rhizobial DNA in soil . The use of
UMIs introduces other design considerations. We used twelve random
nucleotides (with some constraints), giving over four million potential
UMI sequences, which was sufficient for the scale of our studies, but it
would be simple to increase the UMI length if greater sequencing depth
was planned. In any metabarcoding study, the choice of sequencing depth
is, to some degree, made blindly because the diversity of templates is
not known in advance, but UMI-based approaches need greater depth
because it is UMIs that are counted, not reads, and the aim is to have
several reads per UMI. There are many factors that affect the average
number of reads per UMI, but our study is encouraging in that, without
separate optimization, all of our target genes in all of our samples
gave usable data. In fact, the number of reads per UMI were suboptimal
in most cases. Given a fixed sequencing effort, reads per UMI could, if
necessary, be increased by reducing the concentration of the forward
UMI-bearing primer and/or of the sample DNA so that fewer distinct UMIs
were initiated. With our parameters, at least two reads are needed
before a UMI is counted, and a sufficient fraction of the UMIs need at
least four reads so that some will have a secondary sequence as well as
the primary sequence (with at least two reads more than the secondary).