Markov Random Field networks
To understand the role of host- and habitat-related variables and
spatial configuration of the hosts, as well as virus interactions
(Figure1, Table 1) in explaining the network structure, we fitted both
Unconditional (MRF) and Conditional Markov Random Field models (CRF) to
the virus community data (Clark et al. 2018). Markov Random
Fields (MRFs) are graphical models, which can represent complex
distributions as network graphs. These networks consist of nodesand edges , corresponding to the observed variables within the
data, and to the probabilistic interactions between variables that need
to be estimated. The edge associations are undirected, meaning that the
effect of one node on another is reciprocal. If there is no edge between
two nodes in the estimated graph, these nodes are conditionally
independent from one another, whereas if there is an edge, these nodes
are conditionally dependent, after accounting for the other node
effects in the graph model (Cheng et al. 2014). Conditional
Random Fields allow for these dependencies among nodes to be further
conditional on other covariates (Cheng et al. 2014; Clarket al. 2018). Hence, the values for the edge associations can
change in the presence of these covariates, and the resulting graph
model illustrates the pairwise associations between viruses in host
plants, conditional not only on the rest of the virus community, but
also on the covariates included in the model (Table 1).
The modelling framework is described in detail by Clark et al.(2018). Briefly, we modelled the log-odds of detecting virus igiven covariate x and occurrence of virus j with
\begin{equation}
\log\left(\frac{P\left(y_{i}=1\middle|y_{\backslash i},\ x\right)}{1-P\left(y_{i}=1\middle|y_{\backslash i},\ x\right)}\right)=\ \alpha_{i0}+\beta_{i}^{T}x+\sum_{j:j\neq i}{\left(\alpha_{ij0}+\beta_{\text{ij}}^{T}x\right)\text{\ y}_{j}}\ ,\nonumber \\
\end{equation}where \(y_{i}\) is the vector of presences and absences of virusi ; \(y_{\backslash i}\) denotes the presences and absences of all
other viruses except i ; \(\alpha_{i0}\) is the virus-level
intercept; and \(\beta_{i}^{T}\) is the effect of covariate \(x\) on the
occurrence probability of virus i . Parameters \(\alpha_{ij0}\)and \(\beta_{\text{ij}}^{T}\) represent the associations between
viruses, conditional on the occurrences of all the non-focal viruses
(other than the focal virus i ).
We used data on viruses with at least 10 occurrences (16 viruses) in the
entire virus community matrix (i.e. minimum prevalence of 2.5% of
sampling units). To understand how environmental characteristics and the
host affect the virus community, we included several explanatory,
conditioning variables in the model (Table 1), describing: 1) The level
of spatial autocorrelation of the host populations (implemented
as Moran’s eigenvectors) 2) habitat-related characteristics,
namely the quality of the habitat of the host plants (the
connectedness (S ) of the focal P. lanceolata population to
other populations, agricultural land use (percentage of the surrounding
landscape) and the Shannon diversity of the local plant community, which
have been demonstrated to influence virus occurrences in this system
(Susi & Laine 2020), and as the weather conditions of local
populations (severity of the previous winter and temperature sum over
the effective summer days during previous summer); as well as 4)host-related characteristics of the focal host plants (host
population size, host plant size and signs of herbivory). See Table 1
for full details.
Altogether our dataset used for modelling consisted of 16 virus taxa and
16 explanatory variables (Table 1), resulting in 272 coefficients in
each regression. To avoid overfitting, regularisation has been
implemented in the method through least absolute shrinkage and selection
operator (LASSO), forcing some regression coefficients to zero, and thus
performing variable selection and reducing the risk of overfitting
(Clark et al. 2018). To achieve an undirected network and
symmetry within the coefficients of conditional dependence (i.e.\(\alpha_{ij0}=\alpha_{ji0}\) and\(\beta_{\text{ij}}^{T}=\beta_{\text{ji}}^{T}\)) we take the mean of
the corresponding estimates, which is the default setting of the applied
algorithm (Clark et al. 2018).
We fitted six model variants in total: 1) an Unconditional Markov Random
Field model (referred to as ‘MRF’), with only virus occurrences
included, 2) a Conditional Markov Random Field model (CRF) with only
habitat- and host-related (collectively referred as ‘environmental’, see
Table 1) variables included as additional constrains (‘CRFenv’), 3) a
CRF model with only host-related variables included as additional
constraints (‘CRFhost’), 4) a CRF model with only spatial variables and
variables related to habitat (quality and weather) included as
additional constraints (‘CRFhabitat’), 5) a CRF model with only spatial
variables included as additional constrains (‘CRFspat’), 6) a
Conditional Markov Random Field model with both all environmental as
well as spatial variables included as additional constrains (‘CRFfull’).
We will refer to the variants (2-6) collectively as ‘CRF models’ or
‘CRFs’.
We evaluated the model fit by calculating the Area Under Curve values
(Hanley & Mcneil 1982) using the full data set. We used cross
validation (with four folds) to estimate model generality by comparing
predicted and observed outcomes simultaneously for all taxa. To account
for parameter uncertainty of the final model, we modelled 100
bootstrapped replicates for the model. If the 90% confidence interval
of bootstrapped coefficients did not overlap with zero, we considered
the variable to have a statistically significant effect. To test whether
the viral associations were phylogenetically conservative, we compared
the direct associations drawn from all our network models to the
phylogenetic relationships of the viruses, constructed from taxonomy, by
conducting a Mantel test between the matrices.
All results were produced with R (version 4.0.2, R Core Team 2020), and
packages ‘vegan’ (Oksanen et al. 2019), ‘MFRcov’ (Clark et
al. 2018), ‘igraph’ (Csardi & Nepusz 2006), along with their
dependencies. An R package called ‘meta17-network’ including the
analytical pipeline, data, and documentation for full reproduction of
the results can be found in Github (aminorberg/meta17network-pkg).