Discussion
We found including school absences in seasonal models improved community-level confirmed influenza predictions over multiple seasons within Allegheny County. All-school absence models subtly improved predictions, reducing MAE by 5% across multiple validations, but school- and grade-specific absence models had better predictions, reflecting underlying age-specific differences in infections. Elementary school absence (K to 5th grades) models decreased MAEs by 1-16% compared to 6-12th grades, suggesting younger student absences were illness-related and older children’s absences were non-influenza and non-illness related. From school cohort data, ILI- and all-cause absences performed better in single season (2007-2008 and 2012-2013) validations and when pooled across seasons. Elementary school, K-5th grade-specific all-cause absences, and potentially ILI-specific absences, may serve surveillance indicators for the larger community.
Compared to seasonal models, those including all-cause absences improved MAE and R2 estimates, and suggests that after accounting for seasonal factors, school absences improved influenza predictions. Our analysis is one of few using weekly all-cause absences at various administrative levels (i.e., school type and grades) to predict influenza. Whereas other studies used cause-specific absences to detect elementary school influenza outbreaks(6), ours evaluated how different school and grade all-cause absences performed as predictors. As evidenced by higher R2 and lower relMAEs from elementary school absence models, absences from younger school-aged children better reflect infections during the influenza season and are a proxy to the younger age groups that experience higher infections and increased susceptibility(5, 25, 26). In contrast, middle and high schools’ absences were noisier prediction signals, possible because older students had more non-influenza related absences (consistent with the overall higher absenteeism rates observed in these schools over time). Lower relMAEs from lower individual grade (K-5th grades) absence models from multiple validations further support our findings. Hence, elementary school absences could be useful for influenza surveillance.
ILI-specific absences predicted influenza better than all-cause absences when evaluating predictions from weekly all-cause and ILI-specific absence models (using school-based cohort studies), based on lower MAEs and higher R2 for specific seasons and when pooled. Other studies also found ILI-specific absences were a proxy for influenza when evaluating vaccine impacts(27), suggesting ILI-specific absences likely capture actual influenza infections. We could not conduct cause-specific absence surveillance for more than one influenza season for each study nor could we perform school-type and grade-specific comparisons of all-cause and ILI-specific absences due to small time-period, but these may also be important predictors of influenza incidence.
Our study has some limitations. We did not evaluate our predictions during the 2009 pandemic because our county absence data were either limited to single seasons, or available after 2009 because participating schools’ electronic absence surveillance began after 2009. Similarly, cohort studies were funded for and conducted during the 2007, 2012, and 2015 seasons, therefore we could not assess predictions during the 2009 pandemic. In the school-based cohort studies, not all absences were identified due to challenges contacting parents regarding absences and our studies may underestimate the number of all-cause absences, and possibly, ILI-specific absences. Our predictions used school-based data from school districts within Allegheny County only, therefore our results may not be generalizable to influenza transmission in other US counties. Additional data from other Pennsylvania counties or a representative sampling from other state counties would improve the generalizability of our predictions.
Recently, others, like those participating in the CDC FluSight Challenge – an influenza prediction competition – have used climate data, past influenza incidence and other data streams in recent efforts. In the CDC FluSight Challenge, external research teams predict weekly influenza cases, and evaluation metrics include the mean absolute scaled error, a measure of forecast accuracy(28, 29). Our MAE decreased by 5% when using county-level all-cause absences models and is equivalent an additional 8 weeks of data included in a nowcast model, like those used in the FluSight Challenge. This equates to a 5% reduction in mean absolute scaled error(30). Our results suggest that models including lower grades’ absences may improve predictions, as seen by the 10% MAE decrease, and may improve predictions more when incorporated into ensemble models, like those used in FluSight(29).
Our findings suggest models using absences of younger students improves predictive performance. Real-time, day-to-day absence data are easy to collect, readily available in many schools, and can provide more accurate predictions than other surveillance mechanisms reliant on virologic confirmation, and susceptible to laboratory testing delays. Future studies could apply absence data to other prediction methodologies, like ensemble methods and machine-learning algorithms, which may improve prediction accuracy and identify absence-related patterns not considered here. We demonstrate grade-specific all-cause absences predict community level influenza one-week forward, when influenza- or cause-specific absences are unavailable and suggest elementary school or lower grade absenteeism during the influenza season can reflect influenza circulation. Using school indicators can inform influenza surveillance and control efforts, including annual vaccination; antiviral treatment or prophylaxis; and promotion of everyday preventive measures (i.e., staying home when sick, respiratory hygiene, and hand hygiene) to reduce school- and community-level influenza transmission.