Materials and Methods
Amplification of TiLV genomic segments and whole genome
sequencing
Total RNA extracted from eight TiLV-infected specimens used for
amplification of TiLV genomic sequences in the present study were
obtained from our previous works (Dong, Ataguba et al., 2017; Taengphu
et al., 2019). Segment 1 of all samples were already sequenced (Taengphu
et al., 2019). Thus, the remaining TiLV genome segments 2–10 were
individually amplified by RT-PCR, using primers and conditions as
previously described (Pulido et al., 2019). After agarose gel
electrophoresis, the amplicons were gel purified using FavorPrep GEL/PCR
purification kit (Favorgen) before being cloned into pGEM-T easy vector
(Promega). Recombinant clones were sequenced using T7 promoter, and SP6
promoter primers (Macrogen, South Korea).
Sequence data
Together with whole genomes generated in this study, nine publicly
availably whole genome sequences of TiLVs were retrieved from the NCBI
database (accessed on 25 Feb 2020), including those from Israel
(Bacharach et al., 2016), Ecuador (Subramaniam et al., 2019), Peru
(Pulido et al., 2019), USA (Ahasan et al., 2020), Bangladesh (Chaput et
al., 2020), and other sequences from Thailand (Surachetpong et al.,
2017; Al-Hussinee et al., 2018). Their accession numbers are given inTable S1 . Sampling dates of publicly available TiLV genomes
were retrieved from original publications when available, and from the
NCBI GenBank database otherwise, ranging between 2011 and 2019.
Reassortant detection
analysis
Multiple nucleotide sequence alignments were first constructed
separately for each of the 10 segments using MAFFT v7.453 (Katoh and
Standley, 2013). Each alignment was manually curated to remove
non-protein coding regions. They were then concatenated and was checked
for potential conflicting evolutionary signals by using seven programs
(RDP, GENECONV, Chimera, MaxChi, Bootscan, SiScan, and 3Seq), all
implemented in Recombination Detection Program v4.99 (RDP4) (Martin et
al., 2015). Only those detected by more than four programs were
considered. A few reassortants were inferred, and the concatenated
alignment was split into four alignments according to the breakpoints of
conflicting singals identified (Table S2 ). Potential
reassortment events were then checked again within each alignment, but
no additional signal was found.
Temporal signal
investigation
Four phylogenetic trees were estimated from the four alignments under
the maximum likelihood (ML) framework, implemented in IQ-TREE 2 (Minh et
al., 2020). The best-fit nucleotide substitution model for each
alignment was determined by ModelFinder (Kalyaanamoorthy et al., 2017)
under the Bayesian information criterion (Table S2 ), and used
for the phylogenetic reconstruction. Bootstrap method was used to assess
the clade uncertainty (N = 1,000).
The temporal signals in the nucleotide
datasets were assessed using
root-to-tip regression analysis, implemented in the ape R package v5.3
(Paradis and Schliep, 2018). The genetic distances were inferred from
the estimated ML trees, and the root placements were determined by
minimising the residual mean squared errors. To assess the statistical
significance of the observed temporal signals (i.e. the regression
slopes), a null distribution of the regression slope (N = 1,000) was
estimated for each dataset by randomising the tip-dates (Ramsden et al.,
2009). A one-tail p-value was computed as the proportion of the slopes
in the null distribution that were greater than or equal to the observed
slope. We considered the temporal signal as sufficient if the p-value
was below 0.05. The results are summarised in Table S3 .
Tip-dating analysis
Time-calibrated phylogenies were inferred using a Bayesian approach
implemented in BEAST v1.10.4 (Rambaut et al., 2018). The strict clock
and the Bayesian Skyride coalescent prior (Minin et al., 2008) were
applied for all trees. The root age was assigned a uniform prior over
the interval of 7.67–200 years, where 7.67 (i.e., the minimum root
height) was the age difference between the youngest and the oldest
samples. The rough estimates of the root height of about 10–20 years
from the root-to-tip analyses lied within this interval. For each
analysis, two independent Markov chain Monte Carlo (MCMC) chains were
run for 100,000,000 iterations. The first 10,000,000 were discarded as
burnin, and parameter values were logged every
10,000th iteration. Effective sample sizes of all
parameters were between 1,000–5,000, indicating that all parameters
were sufficiently well sampled. Convergence of the MCMC chains was
assessed by comparing the two independently estimated posteriors.
Maximum clade credibility trees with common ancestor node heights were
used to summarise the tree posterior distributions.
Sensitivity analyses
To explore the robustness of our results, we repeated the analyses under
different choices of priors and model assumptions. First, we relaxed the
strict clock assumption by using the uncorrelated relaxed clock model
with a lognormal distribution of the rates among branches (Drummond et
al., 2006). To compare the two models, we estimated the Bayes factor by
using the generalised stepping stone (GSS) sampling technique to
estimate the marginal likelihood for each of the models (Baele et al.,
2015) (Table S4 ). For the GSS estimator, we used 100 path steps
with 10,000,000 iterations of MCMC at each step. The product of
exponential distribution was used as the starting distribution.
Second, the Skyride coalescent model assumed that the population size
changes coincide with the internal nodes of the phylogeny. We relaxed
this assumption by using the Skygrid coalescent model (Gill et al.,
2012), which allows the population size to change at regular grid
points. Instead of performing a separate analysis for each alignment,
this method also allowed a multilocus analysis. Each region had its own
(best-fit) nucleotide substitution process, and a phylogenetic tree
structure, but they all shared a mean substitution rate. To specify the
grid points in the Skygrid model, the time cutoff value was set to 14
years (beyond which the population size was assumed to be constant).
This was based on the root age estimates of around 2005 from the main
analyses. The population size was allowed to change every 0.5 years. We
also varied the time cutoff parameter while the grid points remained 0.5
years apart. We found that all analyses yielded highly similar estimates
(Table S5 ).
Third, there were a number of incomplete TiLV genome sequences deposited
in the NCBI database, available as individual genome segments. We
retrieved them from the database. Short sequences (i.e. those with
length < 90% of the shortest sequence in the main dataset)
were excluded to allow for more information to be retained in the
alignments. In total, four segments had additional sequences, including
segments 1, 3, 5, and 9 (Table S2 and Table S6 ). For
each segment, the same set of analyses, including the ML tree
estimation, temporal signal assessment, and Bayesian phylodynamic
inference, was performed as aforementioned.