Material and Methods
Subjects
All subjects in whole-exome sequencing, subsequent validation, and whole
blood transcriptome sequencing were obtained from two large school-based
cohorts, the Beijing Child and Adolescent Metabolic Syndrome Study
(BCAMS), and the China Child and Adolescent Cardiovascular Health Study
(CCACH)(Liu et al., 2017; Shan et al., 2010). Adipose tissues were
obtained from Plastic Surgery Hospital, Chinese Academy of medical
Sciences. Individuals with hormone treatment, secondary obesity due to
endocrinopathy and serious intercurrent illness were excluded. All cases
and controls were unrelated Han Chinese in Beijing, China. Replication
and validation for promising variants were performed on 6,334 (2,480
obese and 3,854 normal weight) children. Written informed consent were
obtained from the parents or guardians of all subjects. The study was
approved by the ethics committee and institutional review board of the
Capital Institute of Paediatrics (IEC-C-008-A08-V.05.1).
Phenotype
Normal weight and common obesity were defined by age- and sex-specific
BMI cutoff points recommended by the International Obesity Task Force
(IOTF)(Cole, Flegal, Nicholls, & Jackson, 2007). Adiposity was defined
as body fat mass percentage greater than or equal to 20% for boys and
25% for girls aged ≤ 14 years and 30% for girls aged > 14
years(Chen, Lu, & Department of Disease Control Ministry of Health,
2004). Weight-to-Height ratio (WHtR) was calculated to define abdominal
obesity using a boundary value of 0.5(Browning, Hsieh, & Ashwell,
2010). Dyslipidemia was defined by serum total cholesterol (TC) ≥ 5.20
mmol/L, or triacylglycerol (TG) ≥ 1.70 mmol/L, or high-density
lipoprotein cholesterol (HDL-C) ≤ 1.04 mmol/L, or low-density
lipoprotein cholesterol (LDL-C) ≥ 3.37 mmol/L.
Whole-exome sequencing
Genomic
DNA was isolated from peripheral blood leukocytes using the QIAamp DNA
blood kit (Qiagen, Germany). Each DNA library was prepared from 5μg of
qualified genomic DNA which was sheared to 180-200 base pairs. The
SureSelect Human All Exon V4+UTR Kit (Agilent Technologies, USA) was
used to capture 71 Mbps of exons and untranslated regions (UTRs)
according to the manufacturer’s protocol. Paired-end sequencing (2×100
bp) was carried out with the HiSeq 2500 Sequencing System (Illumina
Inc., USA) at Berry Genomics, Co., Ltd.
Quality control and variant
calling
Trimmed reads were aligned to the human reference genome (hg19) using
the Burrows-Wheeler Alignment tool (v0.7.17)(H. Li & Durbin, 2009),
with mapping rates above 99.4%, and > 90% of the target
regions were completely covered with at least 10 × depth. Samples were
required to reach at least 20 × coverage over 70% of the exome target.
Duplicates were removed by Picard (v2.0.1).
Sequence variation, including SNVs and insertion/deletions, were
detected using Genome Analysis Toolkit (v4.1.3.0)(McKenna et al., 2010).
We excluded variants with a call rate of < 95% or
Hardy-Weinberg equilibrium (HWE) P < 1.0E-6. Samples
with a call rate lower than 95% or a heterozygosity rate more than
three standard deviations away from the mean were removed. Samples were
clear of gender-mismatches or relativeness.
Variants for validation were achieved through several filtering and
comparison steps (Figure 1 ). InterVar (v2.0.1) was used for
annotation(Q. Li & Wang, 2017). The highest impact effect was taken for
variants that have different annotations due to multiple transcripts.
SIFT, PolyPhen-2, MutationTaster, and FATHMM were used to predict the
possible impact of coding variants on protein function(Yang & Wang,
2015).
Association analysis
Association test was performed with PLINK (v1.90)(Chang et al., 2015).
Subjects were characterized according to their BMI and related
anthropometrics. Association analysis was performed using linear or
logistic regression assuming an additive model with sex and age as
covariates. To weaken the effect of population stratification, we also
included geographic information as a covariate because the maximal known
stratification for Chinese is northern and southern ancestry. To prove
the associated SNVs were independent, we performed a conditional
analysis on each one by adjusting other significant associated SNVs or
previous reported SNVs. Correction for multiple tests was performed
using the PLINK adjust function, with genomic control-corrected Pvalues being calculated based on the genotypes of all variants in the
final analysis. The threshold for genome-wide significance was set at aP value < 5.0 ×
10-8.
Genotyping and validation
Candidate SNVs showing nominal association (P < 0.05)
with obesity in the first stage were genotyped in an additional large
case-control cohort consisting of 2,480 obese children and 3,854
controls. Genotyping was performed using matrix-assisted laser
desorption/ionization time-of-flight mass spectrometry (SEQUENOM) at
Beijing Compass Biotechnology Co., Ltd. Multiplexed assays were designed
using MassARRAY Assay Design v3.1. Allele-specific base extension was
performed according to recommendations from Agena Bioscience. Sanger
sequencing was performed to validate the candidate variants identified
by WES and replication analysis.
Transcriptomic sequencing
Whole blood were obtained from 58 cases and 43 controls. Whole blood RNA
were extracted using PAXgene® Blood RNA Kit according to its handbook
protocols. Frozen subcutaneous adipose tissue were obtained from 11
children during plastic surgeries. RNA from adipose tissues were
extracted using mirVana™ miRNA Isolation Kit. Total cDNA library was
constructed using NEB Next® Ultra™ RNA Library Prep Kit and paired-end
sequencing was performed on Illumina novaseq6000 platform. Reads
alignment, transcripts including novel splice variants assembly,
abundance computation of these transcripts, and differentially expressed
genes and transcripts compare were analyzed with a comprehensive
software package of HISAT(Kim, Langmead, & Salzberg, 2015),
StringTie(Pertea et al., 2015), and DESeq2(Love, Huber, & Anders,
2014). We focus on the expression pattern of genes, such asSULT1A2 (NC_000016.10), MAP3K21 (NC_000001.11), etc.
which potentially affected by rs1059491 and rs189326455.
Functional analysis
We used MEME suite to predict the effect of SNPs on the binding affinity
of transcription related factors. To evaluate the structural and
conformational change of rs1059491 (SULT1A2: p.N235T), we used PSIPRED
for secondary structure prediction. The differential expression data of
eQTL rs1059491 in adipose tissues and blood were obtained from both
transcriptome sequencing and GTEx website (https://gtexportal.org/).
Visualization was achieved by R
(www.r-project.org/).