Statistical methods
For each water quality variable, we fitted the multiple regression model
Y = Control Site + Distance + Month + Year,
where Y is BOD, nitrate-N, orthophosphate, ammonia-N, DO, pH, or
temperature. Control Site is an indicator (1 = control site, 0 = every
other site) that allows for an upward or downward shift in the mean of Y
at the control site relative to the other sites. It will be positive or
negative, respectively, depending on whether the control site had
elevated or lowered levels of Y compared to the other sites. Distance is
the number of kilometers downstream from the control site (0.0-63.1
kilometers) and is used to test for an upstream-to-downstream trend in
Y. Month is a categorical variable that allows for seasonal fluctuations
in Y, and Year (2016-2019) allows for long-term changes in Y.
Observations made close together in space and time are correlated, so we
fitted the models using generalized least squares, which allows for
correlated observations, with a Gaussian spatio-temporal correlation
structure (Wikle, Zammit-Mangion, & Cressie, 2019). This says the
correlation between two observations Yi andYj a distance \(d^{\text{st}}\) apart in space
and time is
\begin{equation}
\text{cor}\left(Y_{i},\ Y_{j}\right)=e^{-{(d^{\text{st}}/r)}^{2}},\nonumber \\
\end{equation}where the range parameter r controls the spatio-temporal extent
of the correlation. Following Liu et al. (2017), we defined the
spatio-temporal distance \(d^{\text{st}}\) between a locationsi on the river on day tiand another location sj on daytj as a combination of the spatial distance\(|s_{i}-s_{j}|\) (kilometers) and temporal distance\(|t_{i}-t_{j}|\) (days),
\({(d^{\text{st}})}^{2}={(s_{i}-s_{j})}^{2}+\tau^{2}\times{(t_{i}-t_{j})}^{2}\),
where \(\tau\) is a time scaling factor that balances the different
scales of spatial and temporal distances. We selected the values ofr and τ using 10-fold cross validation, and fitted the models to
the data in R using the ”nlme” package (Pinheiro, Bates, DebRoy, &
Sarkar, 2019). Statistical significance of model terms was assessed at
the 0.05 level.
We identified spatio-temporal hot (or cold) spots in each of the seven
water quality variables using a Getis-Ord local G procedure (Ord &
Getis, 1995). More specifically, the statistic \(G_{i}^{*}\) was
computed for each spatio-temporal observation point, giving roughlyn = 875 values of \(G_{i}^{*}\) for each variable (but onlyn = 559 for BOD and n = 686 for ammonia-N). \(G_{i}^{*}\)is a standardized ratio of a local mean to the global mean. It
identifies spatio-temporal clusters of relatively high or low values of
the water quality variable. Hot spots are observation points for which\(G_{i}^{*}\)is higher than a Bonferonni-corrected 95th percentile
cutoff, and cold spots points for which it is lower than the 5th
percentile cutoff. Local means were computed within distance\(d^{\text{st}}=4.8\) of each observation point, and we carried out
the procedure in R using the ”spdep” package (Bivand & Wong, 2018).