Statistical analysis
Given the repeated assessment of allergic outcomes, generalized
estimating equations (GEE) with logit link and exchangeable correlation
structure were used to regress the prevalence of allergic outcomes up to
15 years on the exposures of interest at birth.
First, we used generalized additive models (GAM) 21 to
test the non-linearity of the univariate relationships between each pair
of exposure and “ever” outcome (e.g., ever-asthma is a constructed
outcome variable that is 1 if at least one of the age-specific asthma
variables is 1 and otherwise 0). We discovered that associations
deviated from linearity in many cases.
Therefore, we categorized NDVI
variables and tree counts for the 500 m and 1000 m buffers into
tertiles. Tree counts for the 100 m buffer were dichotomized (= 0vs ≥ 1).
In line with our previous analyses on greenspace and allergic outcomes,11,12 we considered the 500 m buffer variables our
main exposures of interest. The main models were a prioriadjusted for age, sex, family history of allergic diseases, parental
education (based on the highest number of years of school education of
either parent: <10 years, =10 years, >10 years,
according to the German educational system) and season of birth (October
to January, February to March, May to July, August to September,
corresponding to next to no pollen, tree pollen season, grass pollen
season, ragweed and mugwort pollen season, respectively). Season of
birth was categorized according to a regional pollen calendar
(http://www.pollenstiftung.de/pollenvorhersage/pollenflug-kalender).
A series of sensitivity analyses were performed. This included crude
analyses, which were adjusted only for age, and additionally adjusted
analyses with main models also controlled for maternal smoking during
pregnancy and tobacco smoke exposure at home until age 4 (yes, likely
no, no), presence of older siblings, exclusive breastfeeding during the
first four months and birth weight (grams). We also checked the
robustness of the results by using 100 m and 1,000 m buffers. Moreover,
the models for total tree count and allergenic tree count according to
definitions 1 and 2 were additionally adjusted for NDVI to account for
vegetation not captured by the tree registry, e.g., trees on private
grounds, herbs, bushes. Furthermore, we reran the analyses excluding
participants with partially missing outcome data. Effect modification by
age was tested by introducing an interaction term between the exposure
variable and age. Additionally, we stratified our analyses by whether
participants changed their place of residence between birth and 2 years
of age. Finally, we checked effect modification by whether participants
resided within 300 m from the nearest urban green space or forest of at
least 1 ha 22. Green spaces were derived using the
Urban Atlas land use data for the year 2006
(http://www.eea.europa.eu/data-and-maps/data/urban-atlas).
To check whether air pollution modifies the association of interest, an
interaction term between the air pollutant and the exposure variable was
introduced. Additionally, models were stratified by tertiles of
NO2 and summer ozone levels. In all stratified analyses,
we combined low and medium categories of parental education, as there
were sometimes too few cases in the lowest educational category.
Data pre-processing and statistical analyses were done using the
statistical software R 3.6.1 (Vienna, Austria). 23 GEE
models were fitted by the geeglm() function from thegeepack package. 24 GAM models were executed
using the gam() function from the mgcv package.25