Network link prediction
We applied the following network link prediction (NLP) algorithms:
The plug-and-play algorithm (Dallas et al. 2017)
predicts missing links based on conditional probability estimation.
This model was developed to infer the probabilities of unobserved
links being undetected through a set of input parameters.
The Poisson N-mixture link prediction model (Fu et
al. 2019) combines the Poisson N-mixture model used in ecological
research with a low-rank collaborative filtering approach. Poisson
N-mixture models are used in ecological research to account for
imperfect detection in field observations (Royle 2004). Meanwhile,
low-rank matrix completion–based collaborative filtering methods are
a popular approach for NLP in social network studies. Missing entries
in a data matrix are completed based on a low number of known entries
(low rank matrix), e.g. to predict consumer preferences (Candes &
Plan 2010).
We provided ecological, morphological, and phylogenetic input parameters
to these models (Table 1). Both NLP models do currently not allow to
account for phylogenetic uncertainty. Therefore, we included only the
majority-rule consensus host and parasite BI phylogenies and the
dendrograms calculated through the algorithm ward.D2 (Murtagh &
Legendre 2014), one of the most widely used clustering algorithm
(Murtagh & Legendre 2014). To avoid overfitting, we reduced the number
of input variables per parameter through principal coordinate analyses
(PCoA) of the distance matrices of each parameter. Distance matrices of
some parameters (Table 1) were inferred from dendrograms built through
clustering methods employed for the host niche dendrograms. Distance
matrices were computed through the cophenetic function inR v4.0.0 (R Core Team 2021). To address missing data, we imputed
the data matrix (see Dallas et al. 2017) through the
expectation-maximisation with bootstrapping as implemented in theR package amelia (Honaker et al. 2011). Overall, we
provided 9 input parameters consisting of 25 variables (Table 1).
We determined model accuracies as the Area Under the Receiver Operating
Characteristic curve (AUROC) statistic through 10-fold cross validation.
Each time, the algorithms were trained on 80% of the interaction matrix
to predict the remaining 20%. We implemented the models in Rv4.0.0 (R Core Team 2021) and MATLAB v9.9.0 (MathWorks, Natick,
USA) using the provided codes (doi: 10.6084/m9.figshare.4965038;
https://github.com/Hutchinson-Lab/Poisson-N-mixture). Following Dallaset al. (2017), we assessed variable importance of theplug-and-play algorithm by measuring the reduction in model
performance resulting from 500-fold permutation of each of the input
variables. For variable assessment and host-parasite link prediction,
the algorithm was trained on the full dataset. This assessment was not
performed for the Poisson N-mixture model due to lacking
implementation.