TiLV genome dataset
Individual genome segments of TiLVs isolated from eight infected fish samples were RT-PCR amplified, cloned, and sequenced. The samples were collected in Thailand between 2014–2019. The sequences are available from the GenBank database (accession numbers: MN697695–697774). We were able to obtain complete genomes for all eight of them; each comprised 10 segments, termed Segment 1 to 10 according to their sizes (Segment 1 being the largest; Figure S1 ). Together with nine previously published TiLV complete genomes, we estimated the past transmission history and origin of TiLV. The information of all 17 complete genomes used in this study is summarised in Table S1 .

Reassortant detection analysis

A recent study showed that TiLVs can and do undergo reassortment process (Chaput et al., 2020). This means that different genome segments may have different evolutionary histories, and analysing multiple genomic regions with conflicting evolutionary signals together can result in an erroneous estimation of the virus evolutionary history.
Potential conflicting evolutionary signals were checked in a concatenated nucleotide alignment of segments 1–10 by using several programs (see Materials and Methods ). A few reassortants were detected in our dataset (Figure 1 ). The detected signal breakpoints coincided well with the segment boundaries. Our results showed that segments 1–4 shared the same evolutionary history, while segments 5 and 6 had their own distinct histories. Segments 7–10 also shared a common history, but which was distinct from those of others. The concatenated alignment was thus split accordingly (Table S2 ).
Our analysis identified three potential reassortment events: one involving exchanging of segments 1–4 between (the ancestors of) TH-2016-CN isolate and an unknown TiLV, another one involving exchanging of segment 5 between (the ancestors of) TH-2018-K and BD-2017, and the other one involving an exchange of segments 5 and 6 between (the ancestors of) IL-2011 and EC-2012. These results were consistent with the findings previously reported by Chaput et al. (2020). They compared tree topologies estimated from individual genomic segments, and found evidence for reassortment of segments 5 and 6 involving IL-2011 and EC-2012, and that BD-2017 might be involved in a reassortment event. This suggested a role of reassortment in emergence of new combinations of TiLV genomic materials, similar to the one well-established in its distant relatives, influenza viruses (Steel and Lowen, 2014).

Phylogenetic analyses of TiLVs

Chaput et al. (2020) analysed each individual genome segment separately, and yielded phylogenetic trees with high estimation uncertainties (Chaput et al., 2020). This was likely because each segment on its own had relatively low phylogenetic signals. In this study, we therefore grouped the segments based on their shared evolutionary history, and analysed them together. This allowed us to construct relatively longer nucleotide alignments with more phylogenetically informative nucleotide positions (Table S2 ), and in turn gave us more power to resolve the phylogenetic relationships and yield more robust evolutionary estimates (Figure 2 ). Our sequence collection was also “time-stamped”, enabling us to perform tip-dating analyses. In this study, we applied a Bayesian phylogenetic framework to simultaneously co-estimate the TiLVs’ phylogenetic relationships, divergence dates, the rate of evolution, and their population dynamics.
The temporal signal in each sequence dataset was first quantified using root-to-tip regression analysis (Rambaut et al., 2016), and we found that all datasets had a statistically significant signal (p < 0.05; Table S3 ), sufficient for tip-dating analysis. Across all four genomic regions, our analyses consistently estimated the time to most recent common ancestor (tMRCA) of all TiLVs to be between 2003–2009, with the rates of evolution ranging between ~1.81–3.47×10-3 substitutions per site per year (s/n/y) (Figure 2 and Table 1 ). These rate estimates fell within the expected range RNA viruses’ rate (Sanjuán et al., 2010). The analysis of segments 1–4 yielded the most precise date estimate of around mid 2008 (95% highest posterior density (HPD) interval = 2007.5–2009.6). Segment 6 gave the oldest estimate in 2003, but it was also the most uncertain one (95% HPD interval = 1992–2008). This might be indicative of a low, but detectible, temporal signal in the data. Alternatively, this could genuinely reflect its unique early origin and reassortment history. Population dynamics of TiLV estimated from segments 1–4 and 7–10 yielded a potential peak in the population size around 2014–2016, followed by a continuous decline in the population from 2016 onwards. Analyses of segments 5 and 6 gave similar results, although the peaks and the declines were much more modest (Figure 3 ).
Similar estimates of the dates, rates, and demographic histories were obtained under the relaxed clock model (Figures S2S3 , and Table S4 ). Indeed, our analyses did not favour the relaxed clock over the strict clock for all analyses (Table S4 ; log Bayes factor < 3), suggesting relatively constant evolutionary rates over time. Analyses assuming the same evolutionary rate for all alignments based on the Skygrid coalescent model under the strict clock assumption yielded a rate estimate of ~3.0 (2.2–3.4) ×10-3s/n/y, a tMRCA estimate of 2007–2008, and a population size trajectory that peaked around 2014 (Figures S4S5 , andTable S5 ), again, all comparable to those from the main analyses. The estimated phylogenetic relationships were also consistent with those obtained from the main analyses. Finally, molecular clock analyses of larger datasets for segments 1, 3, 5, and 9 yielded largely consistent estimates of tMRCA, evolutionary rates, and demographic histories (Figures S6S9 , Table S4 , andTable S7 ). Altogether, these analyses showed that our date, and rate estimates, as well as demographic histories were robust to the choice of model assumptions.
The estimated tMRCA of 2003–2009 matched well with the first report of a sharp decline in tilapia populations in Israel from 2005 (Eyngor et al., 2014), and the reported mass die-offs in farmed tilapia in Ecuador and Israel staring in 2009 (Bacharach et al., 2016). This also suggested the presences of TiLV in Israel and Ecuador many years before the first report of the virus outbreak in 2014, which, incidentally, was also around the time when the virus population was at its peak (Figure 3 ). Moreover, the analyses consistently showed that the TiLV population steadily declined from 2016 onwards, 1–2 years following the first report of the virus. This could be a result of natural resistance and herd immunity building up in the fish population against the virus, and / or a reflection of a better awareness of the virus, and a better-prepared and more cautious Tilapia importation protocol following the first virus report.
Examination of the four phylogenies revealed complex histories of TiLVs (Figure 2 ). We found that TiLVs from Thailand, including our new samples, often appeared to cluster together, but not always as a single group. They formed a well-supported cluster in the Segment 1–4 tree (clade support = 0.99), and the same was seen in the Segment 7–10 tree, but the grouping was not well supported (clade support < 0.80). In contrast, the virus appeared to split into several well-supported clusters in the Segment 5 and 6 trees; however, due to the low support of deep and long branches, it was therefore perhaps too premature to make any detailed inferences about how TiLVs emerged or were introduced to Thailand. Our analyses, nonetheless, dated the tMRCAs of Thai TiLVs to be between 2008–2011 for the well-supported group in the Segment 1–4 tree. Although TiLV in Thailand was not reported until 2017 (Dong, Siriroob et al., 2017; Surachetpong et al., 2017), retrospective diagnosis of archived fish specimens revealed the TiLV must have been present in Thailand before 2012 (Dong, Ataguba et al., 2017), consistent with and further supporting our results. In addition, analysis of segment 6 robustly inferred TH-2014 to be most closely related to Middle East and South American isolates (clade support = 0.99), while it was found to group with other Thai TiLVs in the Segment 1–4 tree (clade support = 0.99). This suggested that reassortment could indeed happen, even between viruses that are geographically very far apart.
Regarding TiLVs from other countries, the two USA isolates from a tilapia farm in Idaho were consistently shown to be embedded within the clade of Thai TiLVs, having relatively short branches. In particular, they were found to have a clear grouping structure with two Thai isolates, namely TH-2018-N and TH-2019, forming a small clade in all trees (clade support = 1). Their MRCA was dated to be around 2016–2017. This coincided with a recent record that the farm in the USA imported live tilapia from Thailand (Ahasan et al., 2020).
Similarly, the Bangladesh isolate, BD-2017, was inferred to be clustering with Thai isolates in all trees, except in the Segment 5 tree, which showed that it was most closely related to Middle East and South American TiLVs although not well supported. The youngest branching date of the BD-2017 was estimated to be 2012 (segment 6; 95% HPD interval = 2006–2015). Assuming that there was a single introduction of the virus to Bangladesh, this result suggested that TiLV was introduced to Bangladesh after the early 2010s. All of these findings were perfectly congruent with an earlier forecast (Dong, Ataguba et al., 2017), and the results previously reported by Chaput et al. (2020).
Regarding the four Middle East and South American TiLVs, they always appeared to form a group in all trees. Specifically, TiLVs from Ecuador and Peru were inferred to be sisters in all trees, except in the Segment 6 tree, which supported a sister relationship between the Ecuador isolate EC-2012 and the Israeli isolate IL-2011 (Til-4-2011). In contrast, the two isolates from Israel (Til-4-2011 and AD-2016) did not form a clade, and were often found to be the most basal ones in the group. The only exception was that they formed a clade in the Segment 7–10 tree, but then again, the clade was not well-supported, and it was placed relatively deeply towards the root. Together, these findings suggested a single introduction of TiLV into South America perhaps from the Middle East, with an estimated timing around 2010–2013.
Our results were inconclusive about the likely geographical origin of TiLV as a whole. The Segment 1–4 tree showed a well-supported cascading structure of Israeli TiLVs basal to all other isolates, and the most parsimonious interpretation of this was that TiLV originated in Israel or nearby regions. We noted that, in the Segment 5 and 6 trees, the isolates from Israel (and South America) were nested within Thai isolates, suggesting that this might not be the case; however, the support of such pattern was lacking. In fact, our analyses of all segments apart from segments 1–4 consistently showed deep and long branch structures towards the root of the trees. Coupled with the facts that deep branching structures were not well-supported, and that the samples were quite sparse, it was unclear where TiLV actually originated. Additional sequences from diverse locations, and sampling times are thus required to conclusively determine the geographical origin of TiLV.