Materials and Methods

Amplification of TiLV genomic segments and whole genome sequencing

Total RNA extracted from eight TiLV-infected specimens used for amplification of TiLV genomic sequences in the present study were obtained from our previous works (Dong, Ataguba et al., 2017; Taengphu et al., 2019). Segment 1 of all samples were already sequenced (Taengphu et al., 2019). Thus, the remaining TiLV genome segments 2–10 were individually amplified by RT-PCR, using primers and conditions as previously described (Pulido et al., 2019). After agarose gel electrophoresis, the amplicons were gel purified using FavorPrep GEL/PCR purification kit (Favorgen) before being cloned into pGEM-T easy vector (Promega). Recombinant clones were sequenced using T7 promoter, and SP6 promoter primers (Macrogen, South Korea).

Sequence data

Together with whole genomes generated in this study, nine publicly availably whole genome sequences of TiLVs were retrieved from the NCBI database (accessed on 25 Feb 2020), including those from Israel (Bacharach et al., 2016), Ecuador (Subramaniam et al., 2019), Peru (Pulido et al., 2019), USA (Ahasan et al., 2020), Bangladesh (Chaput et al., 2020), and other sequences from Thailand (Surachetpong et al., 2017; Al-Hussinee et al., 2018). Their accession numbers are given inTable S1 . Sampling dates of publicly available TiLV genomes were retrieved from original publications when available, and from the NCBI GenBank database otherwise, ranging between 2011 and 2019.

Reassortant detection analysis

Multiple nucleotide sequence alignments were first constructed separately for each of the 10 segments using MAFFT v7.453 (Katoh and Standley, 2013). Each alignment was manually curated to remove non-protein coding regions. They were then concatenated and was checked for potential conflicting evolutionary signals by using seven programs (RDP, GENECONV, Chimera, MaxChi, Bootscan, SiScan, and 3Seq), all implemented in Recombination Detection Program v4.99 (RDP4) (Martin et al., 2015). Only those detected by more than four programs were considered. A few reassortants were inferred, and the concatenated alignment was split into four alignments according to the breakpoints of conflicting singals identified (Table S2 ). Potential reassortment events were then checked again within each alignment, but no additional signal was found.

Temporal signal investigation

Four phylogenetic trees were estimated from the four alignments under the maximum likelihood (ML) framework, implemented in IQ-TREE 2 (Minh et al., 2020). The best-fit nucleotide substitution model for each alignment was determined by ModelFinder (Kalyaanamoorthy et al., 2017) under the Bayesian information criterion (Table S2 ), and used for the phylogenetic reconstruction. Bootstrap method was used to assess the clade uncertainty (N = 1,000).
The temporal signals in the nucleotide datasets were assessed using root-to-tip regression analysis, implemented in the ape R package v5.3 (Paradis and Schliep, 2018). The genetic distances were inferred from the estimated ML trees, and the root placements were determined by minimising the residual mean squared errors. To assess the statistical significance of the observed temporal signals (i.e. the regression slopes), a null distribution of the regression slope (N = 1,000) was estimated for each dataset by randomising the tip-dates (Ramsden et al., 2009). A one-tail p-value was computed as the proportion of the slopes in the null distribution that were greater than or equal to the observed slope. We considered the temporal signal as sufficient if the p-value was below 0.05. The results are summarised in Table S3 .

Tip-dating analysis

Time-calibrated phylogenies were inferred using a Bayesian approach implemented in BEAST v1.10.4 (Rambaut et al., 2018). The strict clock and the Bayesian Skyride coalescent prior (Minin et al., 2008) were applied for all trees. The root age was assigned a uniform prior over the interval of 7.67–200 years, where 7.67 (i.e., the minimum root height) was the age difference between the youngest and the oldest samples. The rough estimates of the root height of about 10–20 years from the root-to-tip analyses lied within this interval. For each analysis, two independent Markov chain Monte Carlo (MCMC) chains were run for 100,000,000 iterations. The first 10,000,000 were discarded as burnin, and parameter values were logged every 10,000th iteration. Effective sample sizes of all parameters were between 1,000–5,000, indicating that all parameters were sufficiently well sampled. Convergence of the MCMC chains was assessed by comparing the two independently estimated posteriors. Maximum clade credibility trees with common ancestor node heights were used to summarise the tree posterior distributions.

Sensitivity analyses

To explore the robustness of our results, we repeated the analyses under different choices of priors and model assumptions. First, we relaxed the strict clock assumption by using the uncorrelated relaxed clock model with a lognormal distribution of the rates among branches (Drummond et al., 2006). To compare the two models, we estimated the Bayes factor by using the generalised stepping stone (GSS) sampling technique to estimate the marginal likelihood for each of the models (Baele et al., 2015) (Table S4 ). For the GSS estimator, we used 100 path steps with 10,000,000 iterations of MCMC at each step. The product of exponential distribution was used as the starting distribution.
Second, the Skyride coalescent model assumed that the population size changes coincide with the internal nodes of the phylogeny. We relaxed this assumption by using the Skygrid coalescent model (Gill et al., 2012), which allows the population size to change at regular grid points. Instead of performing a separate analysis for each alignment, this method also allowed a multilocus analysis. Each region had its own (best-fit) nucleotide substitution process, and a phylogenetic tree structure, but they all shared a mean substitution rate. To specify the grid points in the Skygrid model, the time cutoff value was set to 14 years (beyond which the population size was assumed to be constant). This was based on the root age estimates of around 2005 from the main analyses. The population size was allowed to change every 0.5 years. We also varied the time cutoff parameter while the grid points remained 0.5 years apart. We found that all analyses yielded highly similar estimates (Table S5 ).
Third, there were a number of incomplete TiLV genome sequences deposited in the NCBI database, available as individual genome segments. We retrieved them from the database. Short sequences (i.e. those with length < 90% of the shortest sequence in the main dataset) were excluded to allow for more information to be retained in the alignments. In total, four segments had additional sequences, including segments 1, 3, 5, and 9 (Table S2 and Table S6 ). For each segment, the same set of analyses, including the ML tree estimation, temporal signal assessment, and Bayesian phylodynamic inference, was performed as aforementioned.