2.2 Challenge 2: Addressing data missing not at random
Another key issue with human studies is that they likely underestimate
the extent of poor health in aging populations given limited approaches
to handle missing data (Jackson et al., 2019; Jackson and Engelman,
2022). Selective dropout at very old ages, especially among unhealthy
and socially disadvantaged persons, is common (Badawi et al., 1999;
Coste et al., 2013; Duim and Lima Passos, 2020; Mirowsky and Reynolds,
2000; Purdie et al., 2002; Van Beijsterveldt et al., 2002; Young et al.,
2006) and can result in the selection of robust individuals in late
life. Moreover, many approaches to research on health across the life
course do not account for competing risks of mortality, health change,
and attrition across groups. This conceals persistent socially driven
health inequities in late life because the privileged sector often
suffers a higher accumulation of health penalties due to longer lives
(Jackson and Engelman, 2022).
As the limitations of missing data have been increasingly recognized,
methods addressing missing data at random have become more widely used
(e.g., multiple imputation maximum likelihood; Graham, 2009). However,
these methods cannot remove the bias associated with data missing not at
random (Goldberg et al., 2021). In contrast, multi-state models provide
flexibility in the number of meaningful life states describing
individual trajectories, allowing us to incorporate temporary
missingness as a discrete state in the model (Engelman and Jackson,
2019). Here, temporary missingness becomes a life state that individuals
can transition into if they leave the study, or out of if they return to
the study. Multi-state models are thus powerful tools that make it
possible to empirically quantify the likelihood that a particular person
belonging to a health state will leave and return to the sample. Because
mortality (i.e., absorbing state) and other types of attrition can also
be easily distinguished and incorporated, multi-state models explicitly
account for the contribution of different types of missing data to the
cohort’s health experiences (Engelman and Jackson, 2019).