2.2 Challenge 2: Addressing data missing not at random
Another key issue with human studies is that they likely underestimate the extent of poor health in aging populations given limited approaches to handle missing data (Jackson et al., 2019; Jackson and Engelman, 2022). Selective dropout at very old ages, especially among unhealthy and socially disadvantaged persons, is common (Badawi et al., 1999; Coste et al., 2013; Duim and Lima Passos, 2020; Mirowsky and Reynolds, 2000; Purdie et al., 2002; Van Beijsterveldt et al., 2002; Young et al., 2006) and can result in the selection of robust individuals in late life. Moreover, many approaches to research on health across the life course do not account for competing risks of mortality, health change, and attrition across groups. This conceals persistent socially driven health inequities in late life because the privileged sector often suffers a higher accumulation of health penalties due to longer lives (Jackson and Engelman, 2022).
As the limitations of missing data have been increasingly recognized, methods addressing missing data at random have become more widely used (e.g., multiple imputation maximum likelihood; Graham, 2009). However, these methods cannot remove the bias associated with data missing not at random (Goldberg et al., 2021). In contrast, multi-state models provide flexibility in the number of meaningful life states describing individual trajectories, allowing us to incorporate temporary missingness as a discrete state in the model (Engelman and Jackson, 2019). Here, temporary missingness becomes a life state that individuals can transition into if they leave the study, or out of if they return to the study. Multi-state models are thus powerful tools that make it possible to empirically quantify the likelihood that a particular person belonging to a health state will leave and return to the sample. Because mortality (i.e., absorbing state) and other types of attrition can also be easily distinguished and incorporated, multi-state models explicitly account for the contribution of different types of missing data to the cohort’s health experiences (Engelman and Jackson, 2019).