Constructing and Testing Poisson Lognormal (PLN) Models
To construct our seasonal interaction networks, we first fitted, and
tested a suite of PLN models. PLN models are a type of joint species
distribution model which uses abiotic factors and species’ abundance
data to infer species joint abundances (Momal et al. 2020). To
control for environmental effects, we built models that included
different combinations of (i) water temperature during sampling, (ii)
dissolved oxygen during sampling, (iii) date of abundance sampling, (iv)
latitude, and (v) site name. We incorporated site name as a potential
variable to account for any site-specific abiotic measurements not
captured by other factors. We also included sampling effort in our
models as excluding this effort reduces the comparability of abundance
samples measured at different places and times (Chiquet et al.2019). Sampling effort was included for each abundance sample and was
pre-calculated as a sum of the total counts of fish caught, a common
approach for including sampling effort in models (Paulson et al.2010). Altogether, the six PLN models built for each season accounted
for the following environmental variable(s): Site name ,Water temperature , Dissolved oxygen , Site name +
Latitude , Site name + Water Temperature , and Site name +
Dissolved oxygen .
To test our seasonal PLN models, we performed a two-step procedure: (1)
evaluated models using non-traditional Bayesian Information Criterion
(BIC), an information-theoretic approach; and, (2) validated the PLN
models against withheld future abundance data. Although our objective
was not to predict species abundances, by comparing models using
information-theoretic techniques and by calculating their predictive
performance on withheld future data, we could select models that had
higher accuracy and lower uncertainty (Bodner et al. in
press ).
The BIC scores were calculated and PLN models with the worst scores per
season were discarded. Note these are non-traditional BIC scores that
represent the variational lower bound of the BIC, which account for the
model’s variational log-likelihood and its number of parameters.
Overall, higher scores indicate better fitting models. The top three
seasonal PLN models with the highest BIC scores were then tested against
the withheld dataset. We evaluated each of the top three model’s
predictive capabilities by removing the two most recent sampling times
for each site and season used as validation measures. We predicted
species abundances and compared the predictions (using root mean squared
error: “RMSE”) to the withheld validation datasets. We also calculated
RMSE only for species with abundances greater than 0 (“RMSE
obs>0”). From the three PLN models tested per season, each
final seasonal model selected had the highest predictive ability as
determined by RMSE and RMSE obs>0.