Haplotype network variation across the genome
We inferred haplotype networks across the 94 loci we sequenced. Using an expectation-maximization method to infer among-SNP linkage disequilibrium, we split these regions into 454 linked segments (Table S2). Segments with missing data and those less than 100bp in length were discarded, retaining 231 segments for haplotype network reconstruction, with A. alba as the outgroup (Figure 4).
Among these segments, 134 were not genetically distinguishable among subspecies with only one or a few haplotypes identified and all haplotypes closely related to each other and shared among the three subspecies. The other 66 segments reliably distinguishaustralasica from the other two subspecies. Among these 66 segments, the BB population shares haplotypes with australasicainstead of marina at seven loci. The third type of segments, 14 in total, delimits marina from the other two subspecies. Five segments distinguish eucalyptifolia , but BB shares haplotypes with eucalyptifolia in all cases. Most importantly, in three segments, haplotypes split into three clusters and each subspecies contains haplotypes from a single cluster. These three segments provide the best subspecies delineation. At other eight segments, each subspecies also contains a cluster of haplotypes, except BB shares haplotypes with eucalyptifolia . Finally, one segment separatesmarina and australasica, but eucalyptifoliacontains haplotypes from both clusters.
The three segments clearly delineating subspecies are from three genomic loci, Am0259, Amc232, and Amc302. We roughly estimate that about 3% of the A. marina genome is highly differentiated among subspecies (three out of the 94 genomic loci surveyed). Am0259 partially covers a protein coding gene, the ortholog of which in Arabidopsis thalinais annotated as “shaggy-related protein kinase.” Amc232 and Amc302 are noncoding. The eight segments that follow subspecies delineation with the exception of the BB population are from seven genomic loci. Similarly, we estimate that about 7% (7 out of 94) of the A. marina genome is highly diverged among subspecies but the divergence is eliminated in populations where subspecies coexist.