Bioinformatic Pipeline
To analyze the raw whole genome sequencing reads, we used a custom made, bioinformatic pipeline for our study organism following the GATK best practices method (Van der Auwera et al., 2013) (Figure 1). Because the whole genome analysis of each louse was computationally expensive and time consuming, we employed a GPU-accelerated software system utilizing NVIDIA RTX6000 GPU nodes on University of Florida’s high performance computing cluster, HiPerGator, to run the software suite, Clara Parabricks (https://github.com/clara-parabricks) and significantly accelerated the processing time. The first 3 steps of the pipeline: reference mapping, coordinate sorting and mark duplicates were performed in Parabricks under BWA-MEM (H. Li, 2013) and GATK4 (McKenna et al., 2010; see Supplementary Materials for parameter details) .