Introduction

Reverse engineering neural circuits requires characterization of their cellular components. Cell types in mammalian brain have been defined based on shared morphological, electrophysiological and, more recently, molecular properties. Single cell RNA-sequencing (scRNA-seq) is a high-throughput method to quantify the majority of transcripts in thousands of cells, and these data have revealed diverse cell types in many mouse brain regions, including neocortex \cite{Tasic_Yao_Smith_Graybuck_Nguyen_2017,Tasic2016,Zeisel2015}, hypothalamus \cite{Campbell2017}, and retina \cite{Shekhar2016,Macosko2015}.
However, single cell profiling does not provide an unbiased survey of neural cell types. Some cell types are more vulnerable to the tissue dissociation process and are underrepresented in the final data set. For example, in mouse neocortex, fast-spiking parvalbumin-positive interneurons and deep-projecting glutamatergic neurons in layer 5b are less than their expected proportions and must be selectively enriched using Cre-driver lines \cite{Tasic2017}. In adult human neocortex, neurons largely do not survive dissociation, whereas non-neuronal cells are highly over-represented \cite{Darmanis2015}. In contrast to whole cells, nuclei are more resistant to mechanical assaults and can be isolated from frozen tissue (@REF). Single nuclei provide sufficient gene expression information to many subtypes of broad cell classes in adult human brain \cite{Lake2016,29227469} and mouse hippocampus \cite{Habib2016}.
However, it remains unknown if the nucleus contains sufficient diversity and number of transcripts to allow for discrimination of closely related cell types at a resolution comparable to whole cells. A recent study compared clustering results for single nuclei and whole cells isolated from mouse somatosensory cortex \cite{Lake2017}, but it only showed similar detection of two highly distinct cell populations: superficial and deeper layer excitatory neurons.
In this study, we compared 463 matched nuclei and whole cells from layer 5 of mouse primary visual cortex (VISp) to investigate differences in single nucleus compared to single cell transcriptomes. We selected this brain region because it contains a known variety of distinct and highly similar cell types that would reveal the cell type detection limit of single cells and nuclei \cite{Tasic2017}. We used the same primary cell source and processed cells and nuclei with the same transcriptomic profiling method, to directly compare the resolution limit of cell type detection from well matched sets of single cells and nuclei.

Results

RNA-Seq profiling of single nuclei and single cells

We isolated 487 NeuN-positive single nuclei from layer 5 of mouse VISp using fluorescence activated cell sorting (FACS). Anti-NeuN staining was performed to enrich for neurons. In parallel, we FAC-sorted 12,866 tdT-positive single cells from all layers of mouse VISp from a variety of Cre-driver lines. For both single nuclei and cells, poly(A)-transcripts were amplified with SMART-Seq v4, cDNA was tagmented by Nextera XT and resulting libraries were sequenced to an average depth of [@2 million reads] (Figure \ref{856440}A). RNA-seq reads were mapped using STAR and gene expression was quantified as the sum of intronic and exonic reads and normalized as counts per million (CPM) and log2-transformed. For each nucleus and cell, the probability of gene dropouts were estimated as a function of average expression level based on empirical noise models.
463 single nuclei (95%) passed quality control metrics, and each nucleus was matched to the most similar nucleus and cell based on the maximum correlated expression of all genes, weighted for gene dropouts. Nuclei had similarly high pairwise correlations to cells as to other nuclei suggesting that cells and nuclei were well matched (Figure \ref{856440}B). As expected, matched cells were derived almost exclusively from layer 5 and adjacent layers 4 and 6 (Figure \ref{259770}B) and from Cre-driver lines that labeled cells in layer 5 (Figure \ref{856440}C and Figure \ref{259770}A,C). The small minority of matched cells isolated from superficial layers were GABAergic interneurons that had similar expression to cells sampled from deeper layers.

Comparison of nuclear and whole cell transcriptomes

Cells contain transcripts from both the nucleus and cytoplasm and therefore should have different distributions of transcripts based on RNA-seq read alignment. Indeed, more than 90% of reads aligned to the mouse genome for most nuclei and cells, but, in nuclei, more than 50% of these reads did not map to known spliced transcripts but were located within genic bounds and were annotated as intronic reads (Figure \ref{835862}A). In contrast, the majority of cells had less than 30% intronic reads with a minority of cells having closer to 50% intronic reads, similar to nuclei. Since intronic reads should be largely restricted to the nucleus [@REF], this suggests that cells have a variable amount of nuclear content. Median gene detection was less for nuclei (7000 genes) than cells (11,000 genes), although including intronic reads increased detection by approximately 2000 genes for both nuclei and cells. Whole brain control RNA had read mapping similar to cells, consistent with dissociated single cells capturing the majority of cell transcripts. Only 10 pg of control RNA was used, and consequently gene detection was similar to single nuclei.
Gene dropouts result from missed detection and biological variability, and both effects are more pronounced in nuclei than cells. Accounting for gene dropouts increased correlations between pairs of nuclei and pairs of cells, although cell-cell similarities remained significantly higher (Figure \ref{835862}B). The vast majority of genes show similar detection in nuclei and cells, while 7217 genes (17%) are detected in at least 25% more cells than nuclei (Figure \ref{835862}C). 8614 genes have significantly higher expression in cells than nuclei (>1.5 fold expression; FDR < 0.05) and are involved in housekeeping functions such as mRNA processing and protein translation (Figure \ref{835862}D). Genetic markers of neuronal activity, such as immediate early genes FosEgr1, and Arc also had increased expression in cells, potentially due to neuron stress during tissue dissociation. 159 genes have higher expression in nuclei, and they are central to neuronal identity including connectivity and signaling genes (Figure \ref{835862}D and Figure \ref{898251}B). Nuclear enriched genes are more than 10-fold longer based on genomic length than cell enriched genes due to the the inclusion of intronic reads (Figure \ref{898251}A), as reported by \citet{Lake2017}. When only exonic reads were used to quantify expression in nuclei and cells, a different set of 146 genes were significantly enriched in nuclei, and many genes were involved in mRNA splicing.

Intronic reads required for similar discrimination of nuclei and cell clusters

Next, we applied an iterative clustering procedure (see Methods and Figure \ref{859252}) to identify clusters of single nuclei and cells that share gene expression profiles. To assess cluster robustness, we repeated clustering 100 times using 80% random sub-samples of nuclei and cells and calculated the proportion of clustering runs that each pair of samples were clustered together. Co-clustering matrices were reordered using Ward's hierarchical clustering and represented as heatmaps with consistent clusters ordered as squares along the diagonal (Figure \ref{984877}A,B).
Clustering includes two steps -- variable gene selection and distance measurement -- that are particularly sensitive to expression quantification. We repeated clustering using intronic and exonic reads or only exonic reads for these steps and ordered co-clustering matrices to match the results using all reads for both steps. When using introns and exons, we found 11 distinct clusters of nuclei and cells, and clusters had similar cohesion (average within cluster co-clustering) and separation (average co-clustering difference with the closest cluster) (Figure \ref{984877}C). Excluding intronic reads for either clustering step reduced the number of clusters detected for nuclei but not cells. Therefore, intronic reads were critical to detect the same number of cell types using single nuclei and cells.

Equivalent cell types identified with nuclei and cells

Cluster relationships were represented as dendrograms that are strikingly similar for nuclei and cells (Figure \ref{985661}A). We compared the 11 clusters identified with single nuclei and cells to reported cell types in mouse VISp \cite{Tasic2016}. Each nuclei and cell cluster could be linked to a reported cell type (Figure \ref{953199}A) and to each other (Figure \ref{985661}B) based on correlated expression of cell type informative genes. Many genes contributed to high expression correlations (r > 0.85) for all cluster pairs (Figure \ref{953199}B). Conserved marker gene expression confirmed that the same 11 cell types were identified with nuclei and cells (Figure \ref{985661}C). These cell types included nine excitatory neuron types from layers 4-6 and two inhibitory interneuron types. Matched cluster proportions were mostly consistent, except two closely related layer 5a subtypes were under- (L5a Batf3) or over-represented (L5a Hsd11b1) among cells (Figure \ref{953199}C). This suggested that the initial matching of cells to nuclei was relatively unbiased.
We hypothesized that most intronic reads were mapped to nuclear transcripts, so quantifying gene expression in cells using only introns would approximate nuclear expression. This was supported by higher correlations of average expression across all nuclei and cells using intronic reads as compared to exonic reads (Figure \ref{953199}D). Thus, a dendrogram of nuclei and cell clusters based on intronic reads paired all matching cell types, except for two closely related layer 5b subtypes. Given this mismatch, this suggests that one cannot simply cluster all cells and nuclei together using introns. In contrast, a dendrogram based on exonic reads separated all cell clusters from nuclei clusters (Figure \ref{985661}D).
While we detected the same cell types using nuclei and cells, we expected that gene expression captured with cells included some additional information. First, we compared the separation of matched pairs of clusters and found that all nuclei and cell clusters were similarly distinct, except one layer 4 type and one layer 5b type were significantly more distinct for cells (Figure \ref{985661}E). Next, we compared how well genes marked cell types by calculating the degree of binary expression. Cell marker scores were, on average, 15% higher than nuclei scores due to fewer expression dropouts in cells (Figure \ref{985661}F).

Nuclear content varies among cell types and genes

Discussion

Materials and Methods

Tissue preparation

Tissue samples were obtained from adult (postnatal day (P) 53-59)) male and female transgenic mice carrying a Cre transgene and a Cre-reporter transgene. Mice were anesthetized with 5% isoflurane and intracardially perfused with either 25 or 50 ml of ice cold, oxygenated artificial cerebral spinal fluid (ACSF) at a flow rate of 9 ml per minute until the liver appeared clear, or the full volume of perfusate had been flushed through the vasculature. The ACSF solution consisted of 0.5mM CaCl2, 25mM D-Glucose, 98mM HCl, 20mM HEPES, 10mM MgSO4, 1.25mM NaH2PO4, 3mM Myo-inositol, 12mM N-acetylcysteine, 96mM N-methyl-D-glucamine, 2.5mM KCl, 25mM NaHCO3, 5mM sodium L-Ascorbate, 3mM sodium pyruvate, 0.01mM Taurine, and 2mM Thiourea. The brain was then rapidly dissected and mounted for coronal slice preparation on the chuck of a Compresstome VF-300 vibrating microtome (Precisionary Instruments). Using a custom designed photodocumentation configuration (Mako G125B PoE camera with custom integrated software), a blockface image was acquired before each section was sliced at 250 μm intervals. The slice was then hemisected along the midline, and both hemispheres were then transferred to chilled, oxygenated ACSF. 
Each slice-hemisphere was transferred into a Sylgard-coated dissection dish containing 3 ml of chilled, oxygenated ACSF. Brightfield and fluorescent images between 4X and 20X were obtained of the intact tissue with a Nikon Digital Sight DS-Fi1 or a Sentech STC-SC500POE camera mounted to a Nikon SMZ1500 dissecting microscope. To guide anatomical targeting for dissection, boundaries were identified by trained anatomists, comparing the blockface image and the slice image to a matched plane of the Allen Reference Atlas. In general, three to five slices were sufficient to capture the targeted region of interest, allowing for expression analysis along the anterior/posterior axis. The region of interest was then dissected and both brightfield and fluorescent images of the dissections were acquired for secondary verification. The dissected regions were transferred in ACSF to a microcentrifuge tube, and stored on ice. This process was repeated for all slices containing the target region of interest, with each region of interest deposited into a new microcentrifuge tube. 
For whole cell dissociation, after all regions of interest were dissected, the ACSF was removed and 1 ml of a 2 mg/ml pronase in ACSF solution was added. Tissue was digested at room temperature (approximately 22°C) for a duration that consisted of adding 15 minutes to the age of the mouse (in days; i.e., P53 specimen had a digestion time of 68 minutes). After digestion, the pronase solution was removed and replaced by 1 ml of ACSF supplemented with 1% Fetal Bovine Serum (FBS). The tissue was washed two more times with the same solution and the sample was then triturated using fire-polished glass pipettes of decreasing bore sizes (600, 300, and 150 μm). The cell suspension was incubated on ice in preparation for fluorescence-activated cell sorting (FACS). FACS preparation involved adding 4’-6-diamidino-2-phenylindole (DAPI) at a final concentration of 4 μg/ml to label dead (DAPI+) versus live (DAPI-) cells. The suspension was then filtered through a fine-mesh cell strainer to remove cell aggregates. Cells were sorted by excluding DAPI positive events and debris, and gating to include red fluorescent events (tdTomato-positive cells). Single cells were collected into strip tubes containing 11.5μl of collection buffer (SMART-Seq v4 lysis buffer 0.83x, Clontech #634894), RNase Inhibitor (0.17U/µl), and ERCCs (External RNA Controls Consortium, MIX1 at a final dilution of 1x10-8) \cite{16179916,25150836}. After sorting, strip tubes containing single cells were centrifuged briefly and then stored at -80°C.
For nuclei isolation, dissected regions of interest were transferred to microcentrifuge tubes, snap frozen in a slurry of dry ice and ethanol, and stored at -80°C until the time of use.  To isolate nuclei, frozen tissues were placed into a homogenization buffer that consisted of 10mM Tris pH 8.0, 250mM sucrose, 25mM KCl, 5mM MgCl2, 0.1% Triton-X 100, 0.5% RNasin Plus RNase inhibitor (Promega), 1X protease inhibitor (Promega), and 0.1mM DTT.  Tissues were placed into a 1ml dounce homogenizer (Wheaton) and homogenized using 10 strokes of the loose dounce pestle followed by 10 strokes of the tight pestle to liberate nuclei . Homogenate was strained through a 30µm cell strainer (Miltenyi Biotech) and centrifuged at 900xg for 10 minutes to pellet nuclei. Nuclei were then resuspended in staining buffer containing 1X PBS supplemented with 0.8% nuclease-free BSA and 0.5% RNasin Plus RNase inhibitor. Mouse anti-NeuN antibody (EMD Millipore, MAB377, Clone A60) was added to the nuclei at a final dilution of 1:1000 and nuclei suspensions were incubated at 4°C for 30 minutes. Nuclei suspensions were then centrifuged at 400xg for 5 minutes and resuspended in clean staining buffer (1X PBS, 0.8% BSA, 0.5% RNasin Plus). Secondary antibody (goat anti-mouse IgG (H+L), Alexa Fluor 594 conjugated, ThermoFisher Scientific) was applied to nuclei suspensions at a dilution of 1:5000 for 30 minutes at 4°C. After incubation in secondary antibody, nuclei suspensions were centrifuged at 400xg for 5 minutes and resuspended in clean staining buffer. Prior to FACS, DAPI was applied to nuclei suspensions at a final concentration of 0.1µg/ml and nuclei suspensions were filtered through a 35µm nylon mesh to remove aggregates. Single nuclei were captured by gating on DAPI-positive events, excluding debris and doublets, and then gating on Alexa Fluor 594 (NeuN) signal. Strip tubes containing FACS isolated single nuclei were then briefly centrifuged and frozen at -80°C.