Bioinformatic analysis
RSII subread files in BAX format were converted to the newer BAM format
using “bax2bam” from PacBio SMRT tools 5.0.1, and reads were
demultiplexed using “lima” from PacBio SMRT tools 7.0.1 using the
options “–different” and “–peek-guess”. Sequences which were
not assigned to one of the barcode pairs used in this experiment were
discarded. Circular consensus sequences (CCS) were generated from the
demultiplexed BAM files using “ccs” from PacBio SMRT tools 5.0.1 (the
last version which supports RSII data), resulting in 49,709 reads.
Sequences were oriented in the forward direction by matching the forward
and reverse primer sequences using Cutadapt v.3.0 (Martin, 2011). Only
reads with both a forward and a reverse primer sequence in the correct
orientation (ITS1 and reverse-complemented LR5) were retained.
Concatamers (Griffith et al., 2018) were identified by searching for the
primer sequence pairs ITS1/reverse-complemented ITS1,
LR5/reverse-complemented LR5, ITS1/ITS1, and LR5/LR5 within the forward
and reverse strands of each of the reads, and if detected, the read was
discarded. Remaining reads were length and quality filtered, allowing
for read lengths of 50 - 2999 bp and a maximum of 12 expected errors per
read. Filtering was performed within the AmpliSeq pipeline, or using
VSEARCH version 2.15.1 (Rognes et al., 2016) for OTU clustering methods,
which did not use AmpliSeq.