Paper prepared for the annual meeting of the population association of
America, 2018.
Introduction
\label{introduction}
A large body of literature investigates neighborhood effects on
educational attainment and economic outcomes \cite{Sampson_2008,Wodtke_2011}. Social theory predicts negative
consequences of neighborhood disadvantage (e.g. on educational
outcomes) for several reasons. Firstly, children living in
disadvantaged neighborhoods suffer a lack of resources. This is
reflected, for example on high poverty level of the neighborhood, poor
school quality and peer composition. Secondly, poor neighborhoods are
more likely to be more ethnically segregated, restricting the speech
community to which parents and children are exposed and thus affecting
academic performances. Lastly, poor neighborhoods are more likely to
be exposed to crime and environmental hazard, such as exposure to
pollution.
Recent studies showed a genetic component of educational attainment
and identified specific genetic loci robustly associated with years of
education, college attendance \cite{23722424,25201988,Okbay_2016}. We use a
polygenic score recently included in the National Study of Adolescent
and Adult Health (Add-Health) survey as a composite measure of genetic
predisposition to higher educational attainment. This variable is
built on the results of a genome-wide association study of educational
attainment \cite{Okbay_2016}.
This study investigates if neighborhood characteristics mediate the
genetic predisposition to attain higher education. We hypothesize that
living in a context of poor economic resources mitigates the effect of
genetic predisposition to higher education. Children born in a
neighborhood where the adult population is poorly educated may cancel
out genetic advantages in education. We investigate different dimensions
in neighborhood characteristics such as average educational level in the
neighborhood, proportion of families living under the poverty level and
unemployment level. In addition, we investigate the role of educational
aspirations as a possible mechanism through which deprived neighborhood
may mitigate the effect of genetic endowment.
To our knowledge, this is the first study that attempts to identify
gene-environment interactions using contextual measures of neighborhood
characteristics in education.
Data and
Methods
\label{data-and-methods}
Data
The data we use come from Wave I and Wave IV of the National Longitudinal Study of Adolescent and Adult Health (Add Health,
http://www.cpc.unc.edu/projects/addhealth), a panel study of a nationally representative sample of adolescents in the United States who were in grades 7 through 12 in Wave I (1995). The Add Health cohort (born between 1976 and 1982) has been followed into young adulthood with four in-home interviews (Wave I in 1995, Wave II in 1996, Wave III in 2001-02, and Wave IV in 2008-09), at the end of which the sample was between 24 and 32 years old. Add Health provides us with the unique opportunity to make use of and combine three different types of information: longitudinal data on respondents’ socio-economic characteristics such as educational attainment; data on geographical context (e.g., census block, census tract, county and state characteristics) and genome-wide data on a subsumple of 9,926 individuals.
Polygenic Score
We construct a polygenic score to predict educational attainment of married men and women using data from Add-Health, building upon the recent findings from a large scale genome-wide association study (GWAS) of educational attainment \cite{Okbay_2016}. Rather than focusing on a limited number of genetic variants, polygenic scores (PGSs) use the entire information in the DNA (or a large proportion of it) to construct a measure of genetic predisposition to higher educational attainment \cite{Plomin_2010,Domingue_2014,Conley_2015}.
Recent advances in molecular genetics have made it possible and relatively inexpensive to measure millions of genetic variants in a single study. The most common type of genetic variation among people is called single nucleotide polymorphism (SNP). SNPs are genetic markers that have two variants called alleles. Since individuals inherit two copies for each SNP, one from each parent, there are three possible outcomes: 0, 1 or 2 copies of a specific allele. SNPs occur normally throughout a person’s DNA. Each SNP represents a difference in a single DNA building block, called a nucleotide.
We generate a polygenic score based on the most recent GWAS results on educational attainment available \cite{Okbay_2016}. The same polygenic score is used in the analysis of years of education and college attainment, since the genetic correlation between the two measures is very high, with the point estimate suggesting a perfect genetic correlation.
Using the summary statistics pubblicly available from the Social Science Genetic Association Consortium (
http://www.thessgac.org), we construct a linear polygenic score weighted for their effect sizes in the meta-analysis. the score is constructed using the softwares PLINK and PRSice
\cite{Purcell_2007,Euesden_2014}. We use the complete set of available SNPs (p-value<1), the score is then clumped using the genotypic data as a reference panel for Linkage Disequilibrium structure. We finally standardise the score to have mean equal to zero and standard deviation equal to one. Figure \ref{dist} shows the distribution of the unstandardised PGS for educational attainment calculated for 9,926 individuals in Add-Health.