Andrew Hoegh

and 6 more

1. The COVID-19 pandemic has highlighted the importance of efficient sampling strategies and statistical methods for monitoring infection prevalence, both in humans and reservoir hosts. Pooled testing can be an efficient tool for learning pathogen prevalence in a population. Typically pooled testing requires a second phase follow up procedure to identify infected individuals, but when the goal is solely to learn prevalence in a population, such as a reservoir host, there are more efficient methods for allocating the second phase samples. 2. To estimate pathogen prevalence in a population, this manuscript presents an approach for data integration with two-phased testing of pooled samples that allows more efficient estimation of prevalence with less samples than traditional methods. The first phase uses pooled samples to estimate the population prevalence and inform efficient strategies for the second phase. To combine information from both phases, we introduce a Bayesian data integration procedure that combines pooled samples with individual samples for joint inferences about the population prevalence. 3. Data integration procedures result in more efficient estimation of prevalence than traditional procedures that only use individual samples or a single phase of pooled sampling. 4. The manuscript presents guidance on implementing the first phase and second phase sampling plans using data integration. Such methods can be used to assess the risk of pathogen spillover from reservoir hosts to humans, or to track pathogens such as SARS-CoV-2 in populations.