Methods

We searched GenBank to obtain sequence data for RHDV1 (Table S1) before different virus strains were known to occur in Australia (Hall et al., 2015, Mahar et al., 2017). In fact, up until 2014, descendants of the originally released RHDV strain were the only virulent RHDV strains known to circulate in Australia (Kovaliski et al., 2014). It is important to highlight that, in all RHDV molecular studies conducted in Australia, sampling was opportunistic, i.e. sampling occurred from rabbit carcasses opportunistically encountered after a disease outbreak. Hence, while we had a substantial temporal coverage, we could only obtain data for one or two individuals within each site and year.
We retained only sequences where, in the original publication, the number of screened samples were stated or where it was confirmed with the author(s) that no bias was introduced towards highly variable sequences. Time of collection of the samples was included in the analysis as fractional years before present. We aligned the sequences in Geneious (Kearse et al., 2012) and, as most sequences found were from the hypervariable gene that encodes for the capsid protein VP60, we trimmed the alignment to include only data for this gene, leaving 1740 nt for analysis.
Using the collated dataset, we conducted coalescent and newly developed phylodynamic analyses and compared results with known epidemiological data collected in the field, thereby providing an indication of the accuracy of these analytical methods implemented in a natural context. Specifically, we initially used the Bayesian sampled ancestor birth-death skyline tree prior (SABDSKY, Gavryushkina et al., 2014). We considered this the most flexible model and used it to test the fit of the data to a strict molecular clock, or alternatively a lognormal or exponential relaxed molecular clock (Drummond et al., 2006) by using the nested sampling approach (Russel et al., 2018) and comparing the Bayes Factor between models (Baele et al., 2012). For each model, we gradually increased the subchain length until we obtained two approximately equal maximum likelihood estimates, between consecutive analyses. We then conducted a second run with the longest subchain to ensure consistent results. We used a very tight lognormal prior around the origin parameter (log mean=2.95 and log stdev=0.01), with a hard lower and upper bound of 18.47 and 19.8 years before the most recent sample, as Australia was RHDV free before its release and we considered it unlikely for outbreaks to go unnoticed for a period longer than a few months. We used a lognormal prior also for R e (log mean=0.5 and log stdev=1) and the total removal rate δ (Gavryushkina et al., 2014) (log mean=2 and log stdev=2). We used a uniform distribution bounded between 0 and 1 for the sampling proportion parameter, but fixed this parameter to zero prior to 1995 because no samples were collected prior to this date. We also fixed the removal probability parameter to zero because we sampled very few individuals across the many that had been infected, and even though in some instances we removed the whole carcass at sampling (therefore reducing the probability of transmitting the disease), it is likely that they already had opportunities to infect other rabbits, so we considered appropriate to make the assumption that sampling did not affect RHDV1 outbreaks. We initially modelledR e and δ with a piecewise constant function through time, allowing different values for these parameters for each five year “epoch” (i.e. a total of four time intervals after 1995). We considered this sufficient to detect changes in these parameters and to specifically test the hypothesis that there had been a change after 2005. However, on inspection of preliminary results, we noted an inverse correlation between R e and δ. Therefore, for all analyses we used a common δ across all epochs while lettingR e to vary. Furthermore, to better reflect the sampling times, we fixed the sampling proportion parameter to zero in the years where we did not have any samples as an alternative approach.
In all analyses, we used bModelTest (Bouckaert and Drummond, 2017) to take into account the uncertainty of the substitution model, and a gamma prior (shape=1, scale=0.001) for the clock rate, which encompasses the expected range for an RNA virus (Biek et al., 2015). After defining the molecular clock, we repeated the analysis with a Birth-Death skyline (BDSKY, Stadler et al., 2013), a Bayesian skyline (BSP, Drummond et al., 2005) and a time-aware Gaussian Markov random field Bayesian skyride coalescent tree priors (Skyride, Minin et al., 2008). We allowed the BSP analysis to consider six population size changes, and applied a lognormal population size prior (with precision equal to 1) to reconstruct changes over time of the virus population size. The mean of the lognormal population size prior for each epoch is the estimated population size of the previous epoch, with a uniform prior (between 0 and 3.8 X 105) for the oldest epoch. For the few sequences for which the exact date of collection was unknown (accession number: GU373617, GU373618, EU650679, EU650680), we put a lognormal prior on the tip date to match the possible range and reflect this uncertainty (the season of collection for these sequences was known, Table S1). It should be noted that the tip dates sampler operator, which is commonly used for this purpose, cannot be used in the SABDSKY model (Gavryushkina pers. comm.), as it is incompatible with the extended state space implied by that model. Instead, we replaced this with the purposely designed ‘Sampled Node Date Random Walker’ operator developed and described by Gavryushkina et al. (2014). Lastly, we attempted to take into account geographical compartmentalisation using the multitype Birth-Death model (aka: Birth-Death with Migration Model, BDMM) (Kühnert et al., 2016). For this analysis we removed the (few) sequences from Western Australia and Tasmania, and limited the analysis to two demes: central-south Australia and east Australia (all priors were kept as above). All analyses were conducted in BEAST2 (Bouckaert et al., 2014) except the skyride analysis, which was run in BEAST1 (Drummond and Rambaut, 2007) with a GTR + Г+ I. Tracer (Rambaut and Drummond, 2007) was used to ensure convergence of the two MCMC that were run for each model (that is, that independent analyses sampled the same parameter space towards the end of the MCMC), and to generate the data for the BSP and the skyride plot. The two runs were then combined for final analysis. Plots for the BDSKY and SABDSKY analyses were generated with the R package bdskytools (https://github.com/laduplessis/bdskytools). All BEAST input files are available from the corresponding author.