Read mapping and variant calling
The quality of short reads produced by the Illumina sequencing platforms
was first examined by FastQC (Andrews, 2010). Short reads were then
mapped to reference sequences using MAQ 0.7.1(Li, Ruan, & Durbin,
2008). Notably, the reference sequences were obtained by sequencing DNA
amplicons of all 94 loci from one A. marina individual using the
Sanger method. We also did this for one A. alba individual for
use as outgroup. In mapping and pileup, the mutation rate between
reference and read was set to 0.002, the threshold of mismatch base
quality sum was 200, and the minimum mapping quality of reads was 30. To
exclude false-positive mismatches, we counted the mismatch rate for each
site across the read and mismatch rate for each base quality. We trimmed
the first and last 10 bases of each read and filtered bases with quality
score less than 30.
By identifying variant sites using MAQ 0.7.1, we obtained nucleotide
polymorphism information within each population. To avoid bias
introduced by sequencing errors, we discarded sites with insufficient
site coverage (<100 reads) and those with minor allele
frequency less than 0.01 in each population (He et al., 2013). We
obtained a list of single nucleotide polymorphisms (SNPs) per
population, with allele frequencies. To reduce false SNPs introduced by
homopolymers or insertions/deletions, putative variants in those regions
were masked. The 16 sets of SNPs were used in the analyses below.