Oliver Manangwa - Authorea

Advances in next-generation sequencing have allowed the use of DNA obtained from unusual sources for wildlife studies. However, these samples have been used predominantly to sequence mitochondrial DNA for species identification while population genetics analyses have been rare. Since next-generation sequencing allows indiscriminate detection of all DNA fragments in a sample, technically it should be possible to sequence whole genomes of animals from environmental samples. Here we used a blood-feeding insect, tsetse fly, to target whole genome sequences of wild animals. Using pools of flies, we compared the ability to recover genomic data from hosts using the short-read sequencing (Illumina) and adaptive sampling of long-read data generated using Oxford nanopore technology (ONT). We found that most of the short-read data (85-99%) was dominated by tsetse fly DNA and that adaptive sampling on the ONT platform did not substantially reduce this proportion. However, once tsetse reads were removed, the remaining data for both platforms tended to belong to the dominant host expected in the tsetse fly blood meal. Reads mapping to elephants, warthogs and giraffes were recovered more reliably than for buffalo, and there was high variance in the contribution of DNA by individual flies to the pools, suggesting that there are host specific biases. We were able to identify over 300,000 SNPs for elephants, which we used to estimate the allele frequencies and expected heterozygosity for the population. Overall, our results show that at least for certain wild mammals, it is possible to recover genome-wide host data from blood-feeding insects.