Bioinformatic Pipeline
To analyze the raw whole genome sequencing reads, we used a custom made,
bioinformatic pipeline for our study organism following the GATK best
practices method (Van der Auwera et al., 2013) (Figure 1). Because the
whole genome analysis of each louse was computationally expensive and
time consuming, we employed a GPU-accelerated software system utilizing
NVIDIA RTX6000 GPU nodes on University of Florida’s high performance
computing cluster, HiPerGator, to run the software suite, Clara
Parabricks (https://github.com/clara-parabricks) and significantly
accelerated the processing time. The first 3 steps of the pipeline:
reference mapping, coordinate sorting and mark duplicates were performed
in Parabricks under BWA-MEM (H. Li, 2013) and GATK4 (McKenna et al.,
2010; see Supplementary Materials for parameter details) .