Autoregressive time-series models
A seasonal auto-regressive integrated moving average time-series (SARIMA) model was fitted to the times series of IgG seroprevalence (all age groups) for Gorakhpur district for 2013—2022. These data were selected due to geographic proximity to the Gorakhpur Station at which the climate data were recorded, and the largest proportion of samples were from this division.
Stationarity of the time series of monthly IgG seroprevalence for Gorakhpur division was assessed visually using autocorrelation function (ACF) plots and statistical tests including the Ljung-Box test for independence (null hypothesis = time independence in a given period of lags), augmented Dickey-Fuller (ADF) t-statistic test for unit root (null hypothesis = unit root present.), and the Kwiatkowski-Phillips-Schmidt-Shin (KPSS) for level and trend stationarity (null hypotheses = time-series is stationary). Following assessment of stationarity, ACF and partial autocorrelation function (PACF) plots were used to guide the manual selection of autoregressive time-series models for IgG seroprevalence, as well as automated model fitting using the ‘auto.arima’ function in the ‘forecast’ package (Hyndman & Khandakar, 2008). Fit was assessed visually using plots of residuals, and statistically by minimising Akaike’s information criterion, AIC.
Cross-correlation functions were then used to assess if a statistical relationship existed between the time-series of IgG seroprevalence and monthly climate variables of total rainfall, mean relative humidity and mean minimum temperature. If correlation was detected, lagged climate variables were included in the model and fit was assessed visually using plots of residuals and statistically by minimising AIC.