Whole-Genome Resequencing Analysis
We removed adapter sequences and quality trimmed reads using AdapterRemoval v2.1.7 (Schubert et al., 2016). We then aligned reads to a new chromosome-level assembly of a yellow-rumped warbler (Setophaga coronata ) (Baiz et al., 2021) using Bowtie2 (Langmead & Salzberg, 2012). PCR duplicates were marked with Picard tools (Broad Institute, 2021).
We analyzed the resultant assemblies using the ANGSD bioinformatics pipeline (Korneliussen et al., 2014), the most appropriate method for low-coverage data. To measure the extent of differentiation, we calculated the global estimation of F ST between the main range S. v. virens and S. v. waynei . We then generated a windowed estimate of F ST in 10kb windows across the genome, comparing S. v. virens and S. v. waynei , thus quantifying whether different parts of the genome are more divergent than other regions of the genome between the groups.
To identify genes within the divergent regions, we used the annotation information associated with the S. coronata genome (Baiz et al., 2021), which used SNP gene predictions, trained on the Zebra Finch (Taeniopygia guttata ) genome, within the MAKER annotation pipeline. We focused on coding sequences where MAKER had either transcript or protein matches with the Zebra Finch annotation. We first identified genes within the two regions (see below) that were bounded by 10Kb F ST windows >0.15. We also focused on two genes intersecting the two highestF ST windows.
We next performed Principal Components Analysis (PCA) to determine the genome-wide signal of clustering. We used PCAngsd (Meisner & Albrechtsen, 2018) to generate a covariance matrix from genome-wide genotype likelihoods, and R 4.0.5 (R Core Team, 2021) to calculate and plot eigenvalues. Lastly, we constructed a bootstrapped phylogeny to estimate relationships of the various populations and to test roughly when S. v. waynei separated from the main group. Genotype likelihoods generated by ANGSD for 31,241,801 sites were analyzed using ngsDist v1.0.8 (Vieira et al., 2016) and run with 100 bootstrap replicates. We produced trees from the resultant distance matrices using FastME v2.1.6.2 (Lefort et al., 2015) and combined support values on the main tree using RAxML v8.2.12 (Stamatakis, 2014).