Constructing and Testing Poisson Lognormal (PLN) Models
To construct our seasonal interaction networks, we first fitted, and tested a suite of PLN models. PLN models are a type of joint species distribution model which uses abiotic factors and species’ abundance data to infer species joint abundances (Momal et al. 2020). To control for environmental effects, we built models that included different combinations of (i) water temperature during sampling, (ii) dissolved oxygen during sampling, (iii) date of abundance sampling, (iv) latitude, and (v) site name. We incorporated site name as a potential variable to account for any site-specific abiotic measurements not captured by other factors. We also included sampling effort in our models as excluding this effort reduces the comparability of abundance samples measured at different places and times (Chiquet et al.2019). Sampling effort was included for each abundance sample and was pre-calculated as a sum of the total counts of fish caught, a common approach for including sampling effort in models (Paulson et al.2010). Altogether, the six PLN models built for each season accounted for the following environmental variable(s): Site name ,Water temperature , Dissolved oxygen , Site name + Latitude , Site name + Water Temperature , and Site name + Dissolved oxygen .
To test our seasonal PLN models, we performed a two-step procedure: (1) evaluated models using non-traditional Bayesian Information Criterion (BIC), an information-theoretic approach; and, (2) validated the PLN models against withheld future abundance data. Although our objective was not to predict species abundances, by comparing models using information-theoretic techniques and by calculating their predictive performance on withheld future data, we could select models that had higher accuracy and lower uncertainty (Bodner et al. in press ).
The BIC scores were calculated and PLN models with the worst scores per season were discarded. Note these are non-traditional BIC scores that represent the variational lower bound of the BIC, which account for the model’s variational log-likelihood and its number of parameters. Overall, higher scores indicate better fitting models. The top three seasonal PLN models with the highest BIC scores were then tested against the withheld dataset. We evaluated each of the top three model’s predictive capabilities by removing the two most recent sampling times for each site and season used as validation measures. We predicted species abundances and compared the predictions (using root mean squared error: “RMSE”) to the withheld validation datasets. We also calculated RMSE only for species with abundances greater than 0 (“RMSE obs>0”). From the three PLN models tested per season, each final seasonal model selected had the highest predictive ability as determined by RMSE and RMSE obs>0.