Statistical analysis
Given the repeated assessment of allergic outcomes, generalized estimating equations (GEE) with logit link and exchangeable correlation structure were used to regress the prevalence of allergic outcomes up to 15 years on the exposures of interest at birth.
First, we used generalized additive models (GAM) 21 to test the non-linearity of the univariate relationships between each pair of exposure and “ever” outcome (e.g., ever-asthma is a constructed outcome variable that is 1 if at least one of the age-specific asthma variables is 1 and otherwise 0). We discovered that associations deviated from linearity in many cases. Therefore, we categorized NDVI variables and tree counts for the 500 m and 1000 m buffers into tertiles. Tree counts for the 100 m buffer were dichotomized (= 0vs ≥ 1).
In line with our previous analyses on greenspace and allergic outcomes,11,12 we considered the 500 m buffer variables our main exposures of interest. The main models were a prioriadjusted for age, sex, family history of allergic diseases, parental education (based on the highest number of years of school education of either parent: <10 years, =10 years, >10 years, according to the German educational system) and season of birth (October to January, February to March, May to July, August to September, corresponding to next to no pollen, tree pollen season, grass pollen season, ragweed and mugwort pollen season, respectively). Season of birth was categorized according to a regional pollen calendar (http://www.pollenstiftung.de/pollenvorhersage/pollenflug-kalender).
A series of sensitivity analyses were performed. This included crude analyses, which were adjusted only for age, and additionally adjusted analyses with main models also controlled for maternal smoking during pregnancy and tobacco smoke exposure at home until age 4 (yes, likely no, no), presence of older siblings, exclusive breastfeeding during the first four months and birth weight (grams). We also checked the robustness of the results by using 100 m and 1,000 m buffers. Moreover, the models for total tree count and allergenic tree count according to definitions 1 and 2 were additionally adjusted for NDVI to account for vegetation not captured by the tree registry, e.g., trees on private grounds, herbs, bushes. Furthermore, we reran the analyses excluding participants with partially missing outcome data. Effect modification by age was tested by introducing an interaction term between the exposure variable and age. Additionally, we stratified our analyses by whether participants changed their place of residence between birth and 2 years of age. Finally, we checked effect modification by whether participants resided within 300 m from the nearest urban green space or forest of at least 1 ha 22. Green spaces were derived using the Urban Atlas land use data for the year 2006 (http://www.eea.europa.eu/data-and-maps/data/urban-atlas).
To check whether air pollution modifies the association of interest, an interaction term between the air pollutant and the exposure variable was introduced. Additionally, models were stratified by tertiles of NO2 and summer ozone levels. In all stratified analyses, we combined low and medium categories of parental education, as there were sometimes too few cases in the lowest educational category.
Data pre-processing and statistical analyses were done using the statistical software R 3.6.1 (Vienna, Austria). 23 GEE models were fitted by the geeglm() function from thegeepack package. 24 GAM models were executed using the gam() function from the mgcv package.25