To directly address this question, we profiled a well-matched set of 463 nuclei and 463 cells from layer 5 of mouse primary visual cortex and identified 11 matching neuronal types: 2 interneuron types and 9 excitatory neuron types. Including intronic reads in gene expression quantification was necessary to achieve high-resolution cell type identification from single nuclei. Intronic reads substantially increased gene detection to 7000 genes per nucleus. In addition, intronic reads were more frequently derived from long genes that are known to have brain-specific expression \cite{Gabel2015} and that help define neuronal connectivity and signaling. Intronic reads may also reflect other cell type specific features, such as retained introns or alternative isoforms. For example, intron retention provides a mechanism for the nuclear storage and rapid translation of long transcripts in response to neuronal activity \cite{Mauger2016}.
We found that nuclei contain at least 20% of all cellular transcripts, and this percentage varies among cell types. For example, two small pyramidal neuron types that have large nuclei relative to overall cell size have nuclei that contain more than half of all transcripts. On average, we detect 4000 more genes in a single cell than a single nucleus, but over 20,000 genes are detected equally well in matched cells and nuclei. Cytoplasm-enriched transcripts are missed by profiling single nuclei but include mostly house-keeping genes, which are not related to neuronal identity. Nucleus-enriched transcripts include protein-coding and non-coding genes that are more likely to be cell type markers than cytoplasmic transcripts. However, single cells do provide overall better detection of cell type marker genes, thereby resulting in improved cluster separation for two pairs of highly similar cell types. Therefore, as more nuclei and cells are profiled, it is likely that finer discrimination of cell types would require single cell profiling. However, the benefits of profiling single nuclei may outweigh potential loss in the finest cell type resolution in certain circumstances, such as with human brain where access to live tissue for cell dissociation is extremely limited.
As large scale initiatives begin to characterize transcriptomic cell types in the whole brain \cite{Ecker2017} and whole organism \cite{Regev_2017}, it is important to understand the strengths and limitations of different mRNA profiling techniques. Here we show that snRNA-seq is well suited for large-scale surveys of cellular diversity in various tissues and has the potential to be less cell type biased than scRNA-seq. For example, single cell profiling of adult human cortex isolated 75% interneurons and 25% excitatory neurons \cite{Darmanis2015}, whereas single nucleus profiling of the same tissue type isolated 30% interneurons and 70% excitatory neurons \cite{Lake2016}, close to the proportions found in situ. Furthermore, snRNA-seq enables the use of archived frozen specimens to study cell types, which will ultimately inform our understanding of human evolution, diversity, and disease.
Materials and Methods
Animals and Tissue preparation
All procedures were approved by the Institutional Animal Care and Use Committee at the Allen Institute for Brain Science (Protocol no. 1511). Animals were provided food and water ad libitum and were maintained on a regular 12-h day/night cycle. Mice were maintained on the C57BL/6J background.
Tissue samples were obtained from adult (postnatal day (P) 53-59)) male and female transgenic mice carrying a Cre transgene and a Cre-reporter transgene. Mice were anesthetized with 5% isoflurane and intracardially perfused with either 25 or 50 ml of ice cold, oxygenated artificial cerebral spinal fluid (ACSF) at a flow rate of 9 ml per minute until the liver appeared clear, or the full volume of perfusate had been flushed through the vasculature. The ACSF solution consisted of 0.5mM CaCl2, 25mM D-Glucose, 98mM HCl, 20mM HEPES, 10mM MgSO4, 1.25mM NaH2PO4, 3mM Myo-inositol, 12mM N-acetylcysteine, 96mM N-methyl-D-glucamine, 2.5mM KCl, 25mM NaHCO3, 5mM sodium L-Ascorbate, 3mM sodium pyruvate, 0.01mM Taurine, and 2mM Thiourea. The brain was then rapidly dissected and mounted for coronal slice preparation on the chuck of a Compresstome VF-300 vibrating microtome (Precisionary Instruments). Using a custom designed photodocumentation configuration (Mako G125B PoE camera with custom integrated software), a blockface image was acquired before each section was sliced at 250 μm intervals. The slice was then hemisected along the midline, and both hemispheres were then transferred to chilled, oxygenated ACSF.
Each slice-hemisphere was transferred into a Sylgard-coated dissection dish containing 3 ml of chilled, oxygenated ACSF. Brightfield and fluorescent images between 4X and 20X were obtained of the intact tissue with a Nikon Digital Sight DS-Fi1 or a Sentech STC-SC500POE camera mounted to a Nikon SMZ1500 dissecting microscope. To guide anatomical targeting for dissection, boundaries were identified by trained anatomists, comparing the blockface image and the slice image to a matched plane of the Allen Reference Atlas. In general, three to five slices were sufficient to capture the targeted region of interest, allowing for expression analysis along the anterior/posterior axis. The region of interest was then dissected and both brightfield and fluorescent images of the dissections were acquired for secondary verification. The dissected regions were transferred in ACSF to a microcentrifuge tube, and stored on ice. This process was repeated for all slices containing the target region of interest, with each region of interest deposited into a new microcentrifuge tube.
For whole cell dissociation, after all regions of interest were dissected, the ACSF was removed and 1 ml of a 2 mg/ml pronase in ACSF solution was added. Tissue was digested at room temperature (approximately 22°C) for a duration that consisted of adding 15 minutes to the age of the mouse (in days; i.e., P53 specimen had a digestion time of 68 minutes). After digestion, the pronase solution was removed and replaced by 1 ml of ACSF supplemented with 1% Fetal Bovine Serum (FBS). The tissue was washed two more times with the same solution and the sample was then triturated using fire-polished glass pipettes of decreasing bore sizes (600, 300, and 150 μm). The cell suspension was incubated on ice in preparation for fluorescence-activated cell sorting (FACS). FACS preparation involved adding 4’-6-diamidino-2-phenylindole (DAPI) at a final concentration of 4 μg/ml to label dead (DAPI+) versus live (DAPI-) cells. The suspension was then filtered through a fine-mesh cell strainer to remove cell aggregates. Cells were sorted by excluding DAPI positive events and debris, and gating to include red fluorescent events (tdTomato-positive cells). Single cells were collected into strip tubes containing 11.5μl of collection buffer (SMART-Seq v4 lysis buffer 0.83x, Clontech #634894), RNase Inhibitor (0.17U/µl), and ERCCs (External RNA Controls Consortium, MIX1 at a final dilution of 1x10-8) \cite{16179916,25150836}. After sorting, strip tubes containing single cells were centrifuged briefly and then stored at -80°C.
For nuclei isolation, dissected regions of interest were transferred to microcentrifuge tubes, snap frozen in a slurry of dry ice and ethanol, and stored at -80°C until the time of use. To isolate nuclei, frozen tissues were placed into a homogenization buffer that consisted of 10mM Tris pH 8.0, 250mM sucrose, 25mM KCl, 5mM MgCl2, 0.1% Triton-X 100, 0.5% RNasin Plus RNase inhibitor (Promega), 1X protease inhibitor (Promega), and 0.1mM DTT. Tissues were placed into a 1ml dounce homogenizer (Wheaton) and homogenized using 10 strokes of the loose dounce pestle followed by 10 strokes of the tight pestle to liberate nuclei . Homogenate was strained through a 30µm cell strainer (Miltenyi Biotech) and centrifuged at 900xg for 10 minutes to pellet nuclei. Nuclei were then resuspended in staining buffer containing 1X PBS supplemented with 0.8% nuclease-free BSA and 0.5% RNasin Plus RNase inhibitor. Mouse anti-NeuN antibody (EMD Millipore, MAB377, Clone A60) was added to the nuclei at a final dilution of 1:1000 and nuclei suspensions were incubated at 4°C for 30 minutes. Nuclei suspensions were then centrifuged at 400xg for 5 minutes and resuspended in clean staining buffer (1X PBS, 0.8% BSA, 0.5% RNasin Plus). Secondary antibody (goat anti-mouse IgG (H+L), Alexa Fluor 594 conjugated, ThermoFisher Scientific) was applied to nuclei suspensions at a dilution of 1:5000 for 30 minutes at 4°C. After incubation in secondary antibody, nuclei suspensions were centrifuged at 400xg for 5 minutes and resuspended in clean staining buffer. Prior to FACS, DAPI was applied to nuclei suspensions at a final concentration of 0.1µg/ml and nuclei suspensions were filtered through a 35µm nylon mesh to remove aggregates. Single nuclei were captured by gating on DAPI-positive events, excluding debris and doublets, and then gating on Alexa Fluor 594 (NeuN) signal. Strip tubes containing FACS isolated single nuclei were then briefly centrifuged and frozen at -80°C.
RNA amplification and library preparation for RNA-seq
The SMART-Seq v4 Ultra Low Input RNA Kit for Sequencing (Clontech #634894) was used per the manufacturer’s instructions for reverse transcription of single cell RNA and subsequent cDNA synthesis. Single cells were stored in 8-strips at -80°C in 11.5 µl of collection buffer (SMART-Seq v4 lysis buffer at 0.83x, RNase Inhibitor at 0.17 U/µl, and ERCC MIX1 at a final dilution of 1x10-8 dilution). Twelve to 24 8-well strips were processed at a time (the equivalent of 1-2 96-well plates). At least 1 control strip was used per amplification set, containing 2 wells without cells but including ERCCs, 2 wells without cells or ERCCs, and either 4 wells of 10 pg of Mouse Whole Brain Total RNA (Zyagen, MR-201) or 2 wells of 10 pg of Mouse Whole Brain Total RNA (Zyagen, MR-201) and 2 wells of 10 pg Control RNA provided in the Clontech kit. Mouse whole cells were subjected to 18 PCR cycles after the reverse transcription step, whereas mouse nuclei were subjected to 21 PCR cycles. AMPure XP Bead (Agencourt AMPure beads XP PCR, Beckman Coulter A63881) purification was done using the Agilent Bravo NGS Option A instrument. A bead ratio of 1x was used (50 µl of AMPure XP beads to 50 µl cDNA PCR product with 1 µl of 10x lysis buffer added, as per Clontech instructions), and purified cDNA was eluted in 17 µl elution buffer provided by Clontech. All samples were quantitated using PicoGreen® on a Molecular Dynamics M2 SpectraMax instrument. A portion of the samples, and all controls, were either run on the Agilent Bioanalyzer 2100 using High Sensitivity DNA chips or the Advanced Analytics Fragment Analyzer (96) using the High Sensitivity NGS Fragment Analysis Kit (1bp-6000bp) to qualify cDNA size distribution. An average of 7.3 ng of cDNA was synthesized across all non-control samples. Purified cDNA was stored in 96-well plates at -20°C until library preparation.
Sequencing libraries were prepared using NexteraXT (Illumina, FC-131-1096) with NexteraXT Index Kit V2 Set A (FC-131-2001). NexteraXT libraries were prepared at 0.5x volume, but otherwise followed the manufacturer’s instructions. An aliquot of each amplified cDNA sample was first normalized to 30 pg/µl with Nuclease-Free Water (Ambion), then this normalized sample aliquot was used as input material into the NexteraXT DNA Library Prep (for a total of 75pg input). AMPure XP bead purification was done using the Agilent Bravo NGS Option A instrument. A bead ratio of 0.9x was used (22.5 ul of AMPure XP beads to 25 ul library product, as per Illumina protocol), and all samples were eluted in 22 µl of Resuspension Buffer (Illumina). All samples were run on either the Agilent Bioanalyzer 2100 using High Sensitivity DNA chips or the Advanced Analytics Fragment Analyzer (96) using the High Sensitivity NGS Fragment Analysis Kit (1bp-6000bp) to for sizing. All samples were quantitated using PicoGreen using a Molecular Dynamics M2 SpectraMax instrument. Molarity was calculated for each sample using average size as reported by Bioanalyzer or Fragment Analyzer and pg/µl concentration as determined by PicoGreen. Samples (5 µl aliquot) were normalized to 2-10 nM with Nuclease-free Water (Ambion), then 2 µl from each sample within one 96-index set was pooled to a total of 192 µl at 2-10 nM concentration. A portion of this library pool was sent to an outside vendor for sequencing on an Illumina HS2500. All of the library pools were run using Illumina High Output V4 chemistry. Covance Genomics Laboratory, a Seattle-based subsidiary of LabCorp Group of Holdings, performed the RNA-Sequencing services. An average of 229 M reads were obtained per pool, with an average of 2.0-3.1 M reads/cell across the entire data set.
RNA-Seq data processing
Raw read (fastq) files were aligned to the GRCm38 mouse genome sequence (Genome Reference Consortium, 2011) with the RefSeq transcriptome version GRCm38.p3 (current as of 1/15/2016) and updated by removing duplicate Entrez gene entries from the gtf reference file for STAR processing. For alignment, Illumina sequencing adapters were clipped from the reads using the fastqMCF program \cite{e2011}. After clipping, the paired-end reads were mapped using Spliced Transcripts Alignment to a Reference (STAR) \cite{Dobin2013} using default settings. STAR uses and builds it own suffix array index which considerably accelerates the alignment step while improving sensitivity and specificity, due to its identification of alternative splice junctions. Reads that did not map to the genome were then aligned to synthetic constructs (i.e. ERCC) sequences and the E.coli genome (version ASM584v2). Quantification was performed using summerizeOverlaps from the R package GenomicAlignments \cite{23950696}. Read alignments to the genome (exonic, intronic, and intergenic counts) were visualized as beeswarm plots using the R package beeswarm.
Expression levels were calculated as counts per million (CPM) of exonic plus intronic reads, and log2(CPM + 1) transformed values were used for a subset of analyses as described below. Gene detection was calculated as the number of genes expressed in each sample with CPM > 0. CPM values reflected absolute transcript number and gene length, i.e. short and abundant transcripts may have the same apparent expression level as long but rarer transcripts. Intron retention varied across genes so no reliable estimates of effective gene lengths were available for expression normalization. Instead, absolute expression levels were estimated as fragments per kilobase per million (FPKM) using only exonic reads so that annotated transcript lengths could be used.
Selection of single nuclei and matched cells
463 of 487 (95%) of single nuclei isolated from layer 5 of mouse VISp passed quality control criteria: >500,000 genome-mapped reads, >75% reads aligned, and >50% unique reads. 12,866 single cells isolated from layers 1-6 of mouse VISp passed quality control criteria: >200,000 transcriptome mapped reads and >1000 genes detected (CPM > 0).
Gene expression was more likely to drop out in samples with lower quality cDNA libraries and for low expressing genes. To estimate gene dropouts due to stochastic transcription or technical artifacts \cite{Kharchenko2014}, expression noise models were fit separately to single nuclei and cells using the "knn.error.models" function of the R package scde (version 2.2.0) with default settings and eight nearest neighbors. Noise models were used to calculate a dropout weight matrix that represented the likelihood of expression dropouts based on average gene expression levels of similar nuclei or cells using mode-relative weighting \cite{dbmi} . The probability of dropout for each sample (s) and gene (g) was estimated based on two expression measurements: average expected expression level of similar samples, \(p\left(x_{\overline{g}}\right)\), and observed expression levels, \(p\left(x_{sg}\right)\), using the "scde.failure.probability" and "scde.posteriors" functions. The dropout weighting was calculated as a combination of these probabilities: \(W_{sg}=1\ -\sqrt{p\left(x_{sg}\right)\cdot\sqrt{p\left(x_{sg}\right)\cdot p\left(x_{\overline{g}}\right)}}.\)