Paper prepared for the annual meeting of the population association of America, 2018.

Introduction

\label{introduction}
A large body of literature investigates neighborhood effects on educational attainment and economic outcomes  \cite{Sampson_2008,Wodtke_2011}.  Social theory predicts negative consequences of neighborhood disadvantage (e.g. on educational outcomes) for several reasons. Firstly, children living in disadvantaged neighborhoods suffer a lack of resources. This is reflected, for example on high poverty level of the neighborhood, poor school quality and peer composition. Secondly, poor neighborhoods are more likely to be more ethnically segregated, restricting the speech community to which parents and children are exposed and thus affecting academic performances. Lastly, poor neighborhoods are more likely to be exposed to crime and environmental hazard, such as exposure to pollution. Recent studies showed a genetic component of educational attainment and identified specific genetic loci robustly associated with years of education, college attendance \cite{23722424,25201988,Okbay_2016}. We use a polygenic score recently included in the National Study of Adolescent and Adult Health (Add-Health) survey as a composite measure of genetic predisposition to higher educational attainment. This variable is built on the results of a genome-wide association study of educational attainment \cite{Okbay_2016}.
This study investigates if neighborhood characteristics mediate the genetic predisposition to attain higher education. We hypothesize that living in a context of poor economic resources mitigates the effect of genetic predisposition to higher education. Children born in a neighborhood where the adult population is poorly educated may cancel out genetic advantages in education. We investigate different dimensions in neighborhood characteristics such as average educational level in the neighborhood, proportion of families living under the poverty level and unemployment level. In addition, we investigate the role of educational aspirations as a possible mechanism through which deprived neighborhood may mitigate the effect of genetic endowment.
To our knowledge, this is the first study that attempts to identify gene-environment interactions using contextual measures of neighborhood characteristics in education.

Data and Methods

\label{data-and-methods}

Data

             The data we use come from Wave I and Wave IV of the National Longitudinal Study of Adolescent and Adult  Health (Add Health, http://www.cpc.unc.edu/projects/addhealth), a panel study of a nationally representative sample of adolescents  in the United States who were in grades 7 through 12 in Wave I (1995). The Add Health cohort (born between 1976 and 1982) has been followed into young  adulthood with four in-home interviews (Wave I in 1995, Wave II in 1996, Wave  III in 2001-02, and Wave IV in 2008-09), at the end of which the sample was  between 24 and 32 years old. Add Health provides us with the unique opportunity to make use of and combine  three different types of information: longitudinal data on respondents’  socio-economic characteristics such as educational attainment;   data on geographical context  (e.g., census block, census tract, county and state characteristics) and genome-wide data on a subsumple of 9,926 individuals.

Polygenic Score

We construct a polygenic score to predict educational attainment of married men and women using data from Add-Health, building upon the recent findings from a large scale genome-wide association study (GWAS) of educational attainment \cite{Okbay_2016}. Rather than focusing on a limited number of genetic variants, polygenic scores (PGSs) use the entire information in the DNA (or a large proportion of it) to construct a measure of genetic predisposition to higher educational attainment  \cite{Plomin_2010,Domingue_2014,Conley_2015}.
 Recent advances in molecular genetics have made it possible and relatively inexpensive to measure millions of genetic variants in a single study. The most common type of genetic variation among people is called single nucleotide polymorphism (SNP). SNPs are genetic markers that have two variants called alleles. Since individuals inherit two copies for each SNP, one from each parent, there are three possible outcomes: 0, 1 or 2 copies of a specific allele. SNPs occur normally throughout a person’s DNA. Each SNP represents a difference in a single DNA building block, called a nucleotide.
 We generate a polygenic score based on the most recent GWAS results on educational attainment available \cite{Okbay_2016}. The same polygenic score is used in the analysis of years of education and college attainment, since the genetic correlation between the two measures is very high, with the point estimate suggesting a perfect genetic correlation. 
Using the summary statistics pubblicly available from the Social Science Genetic Association Consortium (http://www.thessgac.org), we construct a linear polygenic score weighted for their effect sizes in the meta-analysis. the score is constructed using the softwares PLINK and PRSice \cite{Purcell_2007,Euesden_2014}. We use the complete set of available SNPs (p-value<1), the score is then clumped using the genotypic data as a reference panel for Linkage Disequilibrium structure. We finally standardise the score to have mean equal to zero and standard deviation equal to one. Figure \ref{dist} shows the distribution of the unstandardised PGS for educational attainment calculated for  9,926  individuals in Add-Health.