Sequence analysis
We analyzed raw sequence reads using a bioinformatics pipeline designed to trim and sort the sequence reads according to scat sample identification. An outline of the bioinformatic process is as follows: (1) raw reads were paired using PEAR software (Zhang et al., 2014); (2) followed by demultiplexing using 8 basepair index sequences unique to each sample (mismatches discarded) using a novel grep regular expression; (3) lastly, OTUs from each sample were taxonomically assigned using BLAST against 12S vertebrate sequences available in GenBank.
We carried out a series of filtering and quality control measures on taxonomically assigned sequences. For each of the three iDNA datasets, we removed contaminant reads (primarily human DNA sequences) and removed sample replicates that did not amplify (below a 500 read threshold). We then removed OTU’s with either a percent identity score less than 95% or 1% of the total number of sequences in that sample. Finally, we eliminated species that were not found in both sample replicates. We then manually reviewed BLAST results for each purported species to ensure that the 12S barcode discriminated species from sympatric congeners or confamilials and to confirm that the taxonomic assignments were for species regional to Mato Grosso, Brazil. If species were not regional, we examined the other equal matches to reassign non-regional species. If no suitable species level matches were discovered, then these taxa were assigned at the genus level or removed from the dataset.