2.5 Genome wide test of selection
To detect positive selection outside of the coding regions of genes, we used a maximum likelihood analysis of the haplotype frequency spectrum across the genome in order to identify putative the targets of positive selection via signatures of both soft and hard sweeps. For this purpose, we used LASSI Plus (Harris & DeGiorgio, 2020) and the saltiLASSI statistic (DeGiorgio & Szpiech, 2022). This approach is capable of using unphased sequencing data to infer haplotypes and identify genomic regions within population samples that exhibit greater than expected changes in their haplotype allele frequencies given background genomic patterns that are taken as neutrality. This method is able to both estimate the likelihood of a given haplotype sweeping, as well as the inferred width and number of haplotypes sweeping within a given species. To avoid reference bias, we aligned our short read sequencing data from 15 individuals of each species to their respective reference genomes (Yang et al., 2021) using the bwa-mem2 v.2.0pre2 (Vasimuddin, Misra, Li, & Aluru, 2019), and then called variants using bcftools v.1.13-35-ge3ba077 to generate an all-site vcf (Danecek et al., 2021). The resulting vcfs were filtered for low quality calls (QUAL > 30), read depth (5-50), no indels and to no more than 2 alternative alleles at a given site. Inferences of selective sweeps were made using the salti statistic in the LASSI Plus software package (k=10, window size 52, step size = 12). To identify any outlier windows across the genome, we extracted all windows with a salti statistic (L) higher than 4 standard deviations above the mean. L is a composite likelihood ratio test statistic of the haplotype frequency spectra in a given window being distorted relative to genomic background.
Finally, in order to compare the distribution and location of sweeps between our different species, while avoiding reference bias from aligning the samples to a single reference, we scaffolded each species genome against a common chromosome level assembly from a species in a sister genus, the beetle Lochmaea crataegi (NCBI: GCA_947563755.1). Scaffolding was performed using Ragtag v.2.1.0, with default settings using minimap2 (Alonge et al., 2022), with alignments filtered to remove any contigs shorter than 50kb. To identify outlier loci (i.e. those likely to have experienced a sweep), we extracted all windows with a salti statistic (L) greater than 4 standard deviations above the mean.