Illumina Library Prep
We used a two-step polymerase chain reaction (PCR) procedure modified
from the IBest Genomics Core at University of Idaho to amplify the 16s
V4 region of the microbial DNA in each sample
(Taylor et al.,
2019). PCR1 utilized 515F/806R primer pairs to amplify the region of
interest, and PCR2 extended the amplicons with sample-specific barcodes.
For PCR1, we used 7.5 uL Phusion Flash High Fidelity Master Mix
(ThermoFisher, Waltham, MA), 0.15 uL of the forward and reverse 50 µM
primers, 0.75 uL molecular grade Bovine Serum Albumen (BSA, 20 mg/mL,
New England BioLabs), 5.45 uL purified water, and 1 uL template DNA. The
thermal cycling protocol was: initial denaturation at 98°C for 10 sec;
28 cycles of denaturation at 98°C for 1 sec, annealing at 57°C for 5
sec, and extension at 72°C for 20 sec, with a final extension at 72°C
for 60 sec (Taylor et
al., 2019). We visualized PCR products on a 2% agarose gel using 10x
Sybr Green (Molecular Probes, Invitrogen; Carlsbad, CA). All iterations
of PCR1 included a positive (Serratia genomic DNA) and a negative
(purified water) control. Mock communities (BEI Resources, ATCC,
Manassas, VA) were also included in some PCR1 runs, as a further measure
of quality control.
We performed PCR1 in triplicate for each sample, then pooled those
replicates for PCR2. Samples (including positive controls) that showed
strong bands in at least two of three PCR1 replicates were diluted 1:4
in purified water before being used as template DNA in PCR2, and all
others were undiluted. In addition to pooling replicates of the negative
controls, negative controls from different PCR1 runs were further pooled
to minimize the number of samples sent for sequencing. Because of this,
some iterations of PCR2 contained a pooled PCR1 negative control, and
some contained a new PCR2 negative control with purified water instead
of DNA template.
For PCR2, we used 10 uL Phusion Flash High Fidelity Master Mix, 0.24 uL
BSA, 8.01 uL of purified water, 0.75 uL of unique barcoded primer pairs
(2 µM; supplied by IBest Genomics Core, at the University of Idaho; see
Taylor et al., 2019),
and 1 uL of the PCR1 pooled product. The thermal cycling protocol was
the same as the protocol for PCR1, but was run for only 8 cycles (for 36
cycles total). We visualized PCR2 products on a 2% agarose gel with 10x
Sybr Green and confirmed that each sample had undergone a band shift
compared to PCR1 (indicative of attachment of barcode primers). We then
pooled all samples, with pool volume based on the intensity of each
sample band, and shipped them to IBest Genomic Core for purification,
DNA quantification, and sequencing on the Illumina MiSeq platform. All
negative controls and mock communities were included in the sequencing
process, but positive controls were not.
Illumina Raw Data Processing and Statistical Analysis
Sequences were received demultiplexed, with adapters and primers
removed. Quality analysis for each sample was performed using FastQC
(Andrews, 2010) and those results were consolidated using MulitQC
(Ewels et al., 2016).
Mean quality scores and length distribution for the whole dataset were
manually inspected and used to determine a cutoff length of 250 bp for
forward reads and 150 bp for reverse reads. Samples were then processed
in R v3.1.6 (R Core
Team, 2020) using the DADA2 (Callahan et al., 2016) pipeline based on
this tutorial:https://benjjneb.github.io/dada2/tutorial.html.
Samples were trimmed as described above and filtered with a max expected
error of 2. An average of 91.6% of reads were kept in all experimental
samples after processing. Taxonomic classification of amplified sequence
variants (ASVs) was performed through the assignTaxonomy function, using
the Silva database
(Quast et al., 2013),
release 132. Potential contaminants were removed with the Decontam
package (Davis et al.,
2018), using the “prevalence” method with a threshold of 0.1. Control
samples, including a control swab (n = 1), extraction blanks (n = 9),
and PCR negatives (n = 24), were used for comparison. Any ASV that had
fewer than 27 reads across all samples was discarded. One cloacal tissue
sample only kept 12 reads after processing and was not included in
analyses. Finally, read numbers were log transformed to account for
differences in read depth. All parameters were determined based on an
analysis of the mock communities.
Once samples had been processed, the phyloseq package
(McMurdie & Holmes,
2013) was used to organize and store data of different types for
analyses. Shannon diversity index values and richness were then
calculated with the “estimate_richness_ function” from phyloseq, and
Faith’s phylogenetic diversity index values were calculated using the
Picante package
(Kembel et al., 2010)
based on phylogenetic trees created and optimized with Phangorn
(Callahan et al.,
2016; Schliep et al., 2016). The alignment was created with the
DECIPHER package
(Wright, 2016). We
examined how these variables differed across tissue types and swab
samples using mixed-effects ANOVAs with lizard ID as a random effect. We
compared fecal and pre-defecation swab communities, as well as pre- and
post-defecation swab communities using separate paired t-tests. Richness
was log-transformed to account for non-normal distribution when
comparing pre- and post-defecation swabs. Phylogenetic diversity was log
transformed when comparing tissue types.
Pairwise distances between samples were calculated by the vegan package
(Oksanen et al.,
2019) using Bray-Curtis distances, and these distances were then used
to generate non-metric multidimensional scaling (NMDS) plots. Using the
trees described above, we used phyloseq to generate weighted UniFrac
distances to compare phylogenetic community composition, and we used
unweighted UniFrac distance to compare community membership of groups;
both metrics were used to generate Principle Coordinate Analysis (PCoA)
plots. For all distance metrics, dispersion between groups was first
tested with the betadisper function from the vegan package, and then a
PERMANOVA test (Adonis function from the vegan package) was performed.
Differential abundance was tested using the Corncob package
(Martin et al.,
2020). All plots were made with the GGplot2 package
(Wickham, 2017).