Modelling the pattern of lineage-splitting within A.
marina
To infer the lineage-splitting pattern within A. marina , we
compared our real sequences against simulated sequences under eight
models assuming a variety of topologies (Simulation 1). Simulated
sequences under these models were produced using the ms software
(Hudson, 2002). The models are: (1) panmictic; (2) eucalyptifoliaby itself and the other two subspecies together; (3) australasicaby itself and the other two subspecies together; (4) marina by
itself and the other two subspecies together; (5) three separate
lineages with eucalyptifolia diverging first; (6) three lineages
with marina diverging first; (7) three lineages withaustralasica diverging first. (8) three lineages diverging
simultaneously. In simulation 1, groups were divided according to
subspecies designation in the prior. As a control, we constructed
artificial groups by pooling two populations each from one subspecies.
Using these groupings, we repeated the simulations and model selection
on the eight models described above (Simulation 2).
The effective population sizes of the lineages (N) and coalescent times
(T) were common among all models. Notably, to reduce the complexity of
parameter setting and to speed up computation, all population size
parameters were derived from a single parameter N0randomly chosen from the prior distribution. In models with more than
one lineage, N0 was assigned to any one of the lineages
(using as baseline). N of other lineages were produced by multiplying
N0 by θx/θ0, where
θx and θ0 are the observed θ of the
current and baseline lineage respectively.
For each model, we performed 100,000 coalescent simulations using the ms
program (Hudson, 2002). Each simulation contained 80 loci of 1000 base
pairs. Mutation rate was set at
3.26x10-8/generation/bp, estimated from phylogenomic
comparisons to closely related species with whole genomes (He et al.,
2020). The sample size of each group was consistent with our real field
sampling (Table 1). Demographic parameters were drawn randomly from a
uniform prior distribution. Identical prior distributions of
corresponding parameters were set for models within each set (Table S3
& S4).
Ten summary statistics were calculated for each simulated data set,
including segregating site number (S), Watterson’s estimator (θ),
nucleotide polymorphism (π) and Tajima’s D within each group, as well asDXY and FST for each pair
of groups. Summary statistics were calculated for each simulation
independently. Euclidean distances were calculated by comparing
simulated statistics with corresponding observed summary statistics. The
tolerance of retaining simulated data was set to 0.05. Bayesian
posterior probabilities of each model were then estimated following the
Approximate Bayesian Computation (ABC) schema (Beaumont, Zhang, &
Balding, 2002) using the “abc” package in R (Csilléry, François, &
Blum, 2012). The “postpr” function together with “neuralnet” option
in the “abc” R package was used to perform model selection.
We also built four models (v1, v2, v3, and v4) to test whether the
population from Bunbury, Australia (BB, Table1) genetically belongs tomarina or eucalyptifolia (Simulation 3, Table S5). In
model v1 and v2, BB (constant effective population size of
Nbb) and marina (Nma) coalesced
at vT1 generations ago and then the common ancestor
further coalesced with eucalyptifolia (effective population size
Neu) at vT0 generations ago
(vT0>vT1). Model v1
differed from v2 by presence or absence of gene flow (m1and m2) between BB and eucalyptifolia . Similarly,
in models v3 and v4, BB (Nbb) coalesced witheucalyptifolia (Neu) at vT1generations ago. The common ancestor then coalesced with marina(effective population size Nma) at vT0generations ago (vT0>vT1).
Nine summary statistics, Watterson’s estimator (θ) for each population
and pairwise FST and DXY ,
were used in the model selection procedure similar to the one previously
described.