Dropout weighted Pearson correlations were calculated between all pairs of nuclei and cells using 42,003 genes expressed in at least one nucleus and one cell. The cell with the highest correlation to any nucleus was selected as the best match, and this cell and nucleus were removed from further analysis. This process was repeated until 463 best matching cells were selected, and the expression correlations were compared to correlations of the best matching pairs of nuclei (Figure \ref{856440}B). The Cre-lines and dissected cortical layers of origin of the best matching cells were summarized as bar plots (Figure \ref{259770}). Unweighted Pearson correlations were also calculated between all pairs of nuclei and cells to test the effect of accounting for dropouts on sample similarities (Figure \ref{835862}B).

Differential expression analysis

Gene detection was estimated as the proportion of cells and nuclei expressing each gene (CPM > 0). In order to estimate the expected variability of gene detection as a result of population sampling, cells were randomly split into two sets of 231 and 232 cells and genes were grouped into 50 bins based on detection in the first set of cells. For each bin of genes, the 97.5 percentile of detection was calculated for the second set of cells. A 95% confidence interval of gene detection was constructed by reflecting this these binned quantiles across the line of unity. Data were summarized with a hexagonal binned scatter plot and a log-transformed color scale using the R package ggplot2 \cite{Wickham:2009:GEG:1795559}.
Differential expression between nuclei and cells was calculated with the R package limma \cite{ritchie2015limma} using default settings and log2(CPM + 1) expression defined based on two sets of reads: introns plus exons and only exons. Significantly differentially expressed were defined as having >1.5-fold change and a Benjamini-Hochberg corrected P-value < 0.05. Gene expression distributions of nuclei or cells within a cluster were visualized using violin plots, density plots rotated 90 degrees and reflected on the Y-axis.
Differences in alignment statistics and gene counts were calculated between cells, nuclei, and total RNA controls (or just cells and nuclei) with analysis of variance using the "aov" function in R \cite{m1992}. P-values for all comparisons were P<10-13.
Two sets of nucleus- and cell-enriched genes (introns plus exons and exons only) were tested for gene ontology (GO) enrichment using the ToppGene Suite \cite{Chen_2009}. Significantly enriched (Benjamini-Hochberg false discovery rate < 0.05) GO terms were summarized as tree maps with box sizes proportional to -log10(P-values) using REVIGO \cite{Supek_2011} (Figure \ref{898251}).

Clustering

Nuclei and cells were grouped into transcriptomic cell types using an iterative clustering procedure based on community detection in a nearest neighbor graph as described in \citet{26095251}. Clustering was performed using gene expression quantified with exonic reads only or intronic plus exonic reads for two key clustering steps: selecting significantly variable genes and calculating pairwise similarities between nuclei. Four combinations of expression quantification for nuclei and cells resulted in eight independent clustering runs.
For each gene, log2(CPM + 1) expression was centered and scaled across samples. Noise models were used to select significantly variable genes (adjusted variance > 1.25). Dimensionality reduction was performed with principal components analysis (PCA) on variable genes, and the covariance matrix was adjusted to account for gene dropouts using the product of dropout weights across genes for each pair of samples. A maximum of 20 principal components (PCs) were retained for which more variance was explained than the broken stick null distribution, a conservative method of PC retention \cite{Jackson_1993}.
Nearest-neighbor distances between all samples were calculated using the "nn2" function of the R package RANN, and Jaccard similarity coefficients between nearest-neighbor sets were computed. Jaccard coefficients measured the proportion of nearest neighbors shared by each sample and were used as edge weights in constructing an undirected graph of samples. Louvain community detection was used to cluster this graph with 15 nearest neighbors. Considering more than 15 neighbors reduced the power to detect small clusters due to the resolution limit of community detection \cite{Fortunato_Barthelemy_2007}. Considering fewer than 15 neighbors increased over-splitting, as expected based on simulations by \citet{reichardt2006statistical}. Fewer nearest neighbors were used only when there were 15 or fewer samples total.
Clustering significance was tested by comparing the observed modularity to the expected modularity of an Erdös-Rényi random graph with a matching number of nodes and average connection probability. Expected modularity was calculated as the maximum estimated by two reported equations \cite{Guimerà_Sales-Pardo_Amaral_2004,reichardt2006statistical}. Samples were split into clusters only if the observed modularity was greater than the expected modularity, and only clusters with distinct marker genes were retained. Marker genes were defined for all cluster pairs using two criteria: 1) significant differential expression (Benjamini-Hochberg false discovery rate < 0.05) using the R package limma and 2) either binary expression (CPM > 1 in >50% samples in one cluster and <10% in the second cluster) or >100-fold difference in expression. Pairs of clusters were merged if either cluster lacked at least one marker gene.
Clustering was applied iteratively to each sub-cluster until the occurrence of one of four stop criteria: 1) fewer than six samples (due to a minimum cluster size of three); 2) no significantly variable genes; 3) no significantly variable PCs; 4) no significant clusters.
To assess the robustness of clusters, the iterative clustering procedure described above was repeated 100 times for random sets of 80% of samples. A co-clustering matrix was generated that represented the proportion of clustering iterations that each pair of samples were assigned to the same cluster. Average-linkage hierarchical clustering was applied to this matrix followed by dynamic branch cutting using "cutreeHybrid" in the R package WGCNA \cite{langfelder2007defining} with cut height ranging from 0.01 to 0.99 in steps of 0.01. A cut height was selected that resulted in the median number of clusters detected across all 100 iterations. Cluster cohesion (average within cluster co-clustering) and separation (difference between within cluster co-clustering and maximum between cluster co-clustering) was calculated for all clusters. Marker genes were defined for all cluster pairs as described above, and clusters were merged if they had a co-clustering separation <0.25 or either cluster lacked at least one marker gene.

Scoring marker genes based on cluster specificity

Many genes were expressed in the majority of nuclei or cells in a subset of clusters. A marker score (beta) was defined for all genes to measure how binary expression was among clusters, independent of the number of clusters labeled. First, the proportion (xi) of samples in each cluster that expressed a gene above background level (CPM > 1) was calculated. Then, scores were defined as the squared differences in proportions normalized by the sum of absolute differences plus a small constant (ε) to avoid division by zero. Scores ranged from 0 to 1, and a perfectly binary marker had a score equal to 1.