Illumina Library Prep
We used a two-step polymerase chain reaction (PCR) procedure modified from the IBest Genomics Core at University of Idaho to amplify the 16s V4 region of the microbial DNA in each sample (Taylor et al., 2019). PCR1 utilized 515F/806R primer pairs to amplify the region of interest, and PCR2 extended the amplicons with sample-specific barcodes. For PCR1, we used 7.5 uL Phusion Flash High Fidelity Master Mix (ThermoFisher, Waltham, MA), 0.15 uL of the forward and reverse 50 µM primers, 0.75 uL molecular grade Bovine Serum Albumen (BSA, 20 mg/mL, New England BioLabs), 5.45 uL purified water, and 1 uL template DNA. The thermal cycling protocol was: initial denaturation at 98°C for 10 sec; 28 cycles of denaturation at 98°C for 1 sec, annealing at 57°C for 5 sec, and extension at 72°C for 20 sec, with a final extension at 72°C for 60 sec (Taylor et al., 2019). We visualized PCR products on a 2% agarose gel using 10x Sybr Green (Molecular Probes, Invitrogen; Carlsbad, CA). All iterations of PCR1 included a positive (Serratia genomic DNA) and a negative (purified water) control. Mock communities (BEI Resources, ATCC, Manassas, VA) were also included in some PCR1 runs, as a further measure of quality control.
We performed PCR1 in triplicate for each sample, then pooled those replicates for PCR2. Samples (including positive controls) that showed strong bands in at least two of three PCR1 replicates were diluted 1:4 in purified water before being used as template DNA in PCR2, and all others were undiluted. In addition to pooling replicates of the negative controls, negative controls from different PCR1 runs were further pooled to minimize the number of samples sent for sequencing. Because of this, some iterations of PCR2 contained a pooled PCR1 negative control, and some contained a new PCR2 negative control with purified water instead of DNA template.
For PCR2, we used 10 uL Phusion Flash High Fidelity Master Mix, 0.24 uL BSA, 8.01 uL of purified water, 0.75 uL of unique barcoded primer pairs (2 µM; supplied by IBest Genomics Core, at the University of Idaho; see Taylor et al., 2019), and 1 uL of the PCR1 pooled product. The thermal cycling protocol was the same as the protocol for PCR1, but was run for only 8 cycles (for 36 cycles total). We visualized PCR2 products on a 2% agarose gel with 10x Sybr Green and confirmed that each sample had undergone a band shift compared to PCR1 (indicative of attachment of barcode primers). We then pooled all samples, with pool volume based on the intensity of each sample band, and shipped them to IBest Genomic Core for purification, DNA quantification, and sequencing on the Illumina MiSeq platform. All negative controls and mock communities were included in the sequencing process, but positive controls were not.
Illumina Raw Data Processing and Statistical Analysis
Sequences were received demultiplexed, with adapters and primers removed. Quality analysis for each sample was performed using FastQC (Andrews, 2010) and those results were consolidated using MulitQC (Ewels et al., 2016). Mean quality scores and length distribution for the whole dataset were manually inspected and used to determine a cutoff length of 250 bp for forward reads and 150 bp for reverse reads. Samples were then processed in R v3.1.6 (R Core Team, 2020) using the DADA2 (Callahan et al., 2016) pipeline based on this tutorial:https://benjjneb.github.io/dada2/tutorial.html. Samples were trimmed as described above and filtered with a max expected error of 2. An average of 91.6% of reads were kept in all experimental samples after processing. Taxonomic classification of amplified sequence variants (ASVs) was performed through the assignTaxonomy function, using the Silva database (Quast et al., 2013), release 132. Potential contaminants were removed with the Decontam package (Davis et al., 2018), using the “prevalence” method with a threshold of 0.1. Control samples, including a control swab (n = 1), extraction blanks (n = 9), and PCR negatives (n = 24), were used for comparison. Any ASV that had fewer than 27 reads across all samples was discarded. One cloacal tissue sample only kept 12 reads after processing and was not included in analyses. Finally, read numbers were log transformed to account for differences in read depth. All parameters were determined based on an analysis of the mock communities.
Once samples had been processed, the phyloseq package (McMurdie & Holmes, 2013) was used to organize and store data of different types for analyses. Shannon diversity index values and richness were then calculated with the “estimate_richness_ function” from phyloseq, and Faith’s phylogenetic diversity index values were calculated using the Picante package (Kembel et al., 2010) based on phylogenetic trees created and optimized with Phangorn (Callahan et al., 2016; Schliep et al., 2016). The alignment was created with the DECIPHER package (Wright, 2016). We examined how these variables differed across tissue types and swab samples using mixed-effects ANOVAs with lizard ID as a random effect. We compared fecal and pre-defecation swab communities, as well as pre- and post-defecation swab communities using separate paired t-tests. Richness was log-transformed to account for non-normal distribution when comparing pre- and post-defecation swabs. Phylogenetic diversity was log transformed when comparing tissue types.
Pairwise distances between samples were calculated by the vegan package (Oksanen et al., 2019) using Bray-Curtis distances, and these distances were then used to generate non-metric multidimensional scaling (NMDS) plots. Using the trees described above, we used phyloseq to generate weighted UniFrac distances to compare phylogenetic community composition, and we used unweighted UniFrac distance to compare community membership of groups; both metrics were used to generate Principle Coordinate Analysis (PCoA) plots. For all distance metrics, dispersion between groups was first tested with the betadisper function from the vegan package, and then a PERMANOVA test (Adonis function from the vegan package) was performed. Differential abundance was tested using the Corncob package (Martin et al., 2020). All plots were made with the GGplot2 package (Wickham, 2017).