Markov Random Field networks
To understand the role of host- and habitat-related variables and spatial configuration of the hosts, as well as virus interactions (Figure1, Table 1) in explaining the network structure, we fitted both Unconditional (MRF) and Conditional Markov Random Field models (CRF) to the virus community data (Clark et al. 2018). Markov Random Fields (MRFs) are graphical models, which can represent complex distributions as network graphs. These networks consist of nodesand edges , corresponding to the observed variables within the data, and to the probabilistic interactions between variables that need to be estimated. The edge associations are undirected, meaning that the effect of one node on another is reciprocal. If there is no edge between two nodes in the estimated graph, these nodes are conditionally independent from one another, whereas if there is an edge, these nodes are conditionally dependent, after accounting for the other node effects in the graph model (Cheng et al. 2014). Conditional Random Fields allow for these dependencies among nodes to be further conditional on other covariates (Cheng et al. 2014; Clarket al. 2018). Hence, the values for the edge associations can change in the presence of these covariates, and the resulting graph model illustrates the pairwise associations between viruses in host plants, conditional not only on the rest of the virus community, but also on the covariates included in the model (Table 1).
The modelling framework is described in detail by Clark et al.(2018). Briefly, we modelled the log-odds of detecting virus igiven covariate x and occurrence of virus j with
\begin{equation} \log\left(\frac{P\left(y_{i}=1\middle|y_{\backslash i},\ x\right)}{1-P\left(y_{i}=1\middle|y_{\backslash i},\ x\right)}\right)=\ \alpha_{i0}+\beta_{i}^{T}x+\sum_{j:j\neq i}{\left(\alpha_{ij0}+\beta_{\text{ij}}^{T}x\right)\text{\ y}_{j}}\ ,\nonumber \\ \end{equation}
where \(y_{i}\) is the vector of presences and absences of virusi ; \(y_{\backslash i}\) denotes the presences and absences of all other viruses except i ; \(\alpha_{i0}\) is the virus-level intercept; and \(\beta_{i}^{T}\) is the effect of covariate \(x\) on the occurrence probability of virus i . Parameters \(\alpha_{ij0}\)and \(\beta_{\text{ij}}^{T}\) represent the associations between viruses, conditional on the occurrences of all the non-focal viruses (other than the focal virus i ).
We used data on viruses with at least 10 occurrences (16 viruses) in the entire virus community matrix (i.e. minimum prevalence of 2.5% of sampling units). To understand how environmental characteristics and the host affect the virus community, we included several explanatory, conditioning variables in the model (Table 1), describing: 1) The level of spatial autocorrelation of the host populations (implemented as Moran’s eigenvectors) 2) habitat-related characteristics, namely the quality of the habitat of the host plants (the connectedness (S ) of the focal P. lanceolata population to other populations, agricultural land use (percentage of the surrounding landscape) and the Shannon diversity of the local plant community, which have been demonstrated to influence virus occurrences in this system (Susi & Laine 2020), and as the weather conditions of local populations (severity of the previous winter and temperature sum over the effective summer days during previous summer); as well as 4)host-related characteristics of the focal host plants (host population size, host plant size and signs of herbivory). See Table 1 for full details.
Altogether our dataset used for modelling consisted of 16 virus taxa and 16 explanatory variables (Table 1), resulting in 272 coefficients in each regression. To avoid overfitting, regularisation has been implemented in the method through least absolute shrinkage and selection operator (LASSO), forcing some regression coefficients to zero, and thus performing variable selection and reducing the risk of overfitting (Clark et al. 2018). To achieve an undirected network and symmetry within the coefficients of conditional dependence (i.e.\(\alpha_{ij0}=\alpha_{ji0}\) and\(\beta_{\text{ij}}^{T}=\beta_{\text{ji}}^{T}\)) we take the mean of the corresponding estimates, which is the default setting of the applied algorithm (Clark et al. 2018).
We fitted six model variants in total: 1) an Unconditional Markov Random Field model (referred to as ‘MRF’), with only virus occurrences included, 2) a Conditional Markov Random Field model (CRF) with only habitat- and host-related (collectively referred as ‘environmental’, see Table 1) variables included as additional constrains (‘CRFenv’), 3) a CRF model with only host-related variables included as additional constraints (‘CRFhost’), 4) a CRF model with only spatial variables and variables related to habitat (quality and weather) included as additional constraints (‘CRFhabitat’), 5) a CRF model with only spatial variables included as additional constrains (‘CRFspat’), 6) a Conditional Markov Random Field model with both all environmental as well as spatial variables included as additional constrains (‘CRFfull’). We will refer to the variants (2-6) collectively as ‘CRF models’ or ‘CRFs’.
We evaluated the model fit by calculating the Area Under Curve values (Hanley & Mcneil 1982) using the full data set. We used cross validation (with four folds) to estimate model generality by comparing predicted and observed outcomes simultaneously for all taxa. To account for parameter uncertainty of the final model, we modelled 100 bootstrapped replicates for the model. If the 90% confidence interval of bootstrapped coefficients did not overlap with zero, we considered the variable to have a statistically significant effect. To test whether the viral associations were phylogenetically conservative, we compared the direct associations drawn from all our network models to the phylogenetic relationships of the viruses, constructed from taxonomy, by conducting a Mantel test between the matrices.
All results were produced with R (version 4.0.2, R Core Team 2020), and packages ‘vegan’ (Oksanen et al. 2019), ‘MFRcov’ (Clark et al. 2018), ‘igraph’ (Csardi & Nepusz 2006), along with their dependencies. An R package called ‘meta17-network’ including the analytical pipeline, data, and documentation for full reproduction of the results can be found in Github (aminorberg/meta17network-pkg).