Cluster visualization

The relationships between cell type clusters were represented as a constellation diagram where the area of each disc is proportional to the number of nuclei in each cluster and the similarity between clusters is proportional to the width of the lines connecting clusters. Cluster similarity was calculated as the average co-clustering between all nuclei for each pair of clusters. For example, a similarity of 0.1 indicates that 1 out of 10 clustering iterations nuclei from one cluster were assigned to the other cluster. Similarities less than 0.05 were not plotted.
Next, clusters were arranged by transcriptomic similarity based on hierarchical clustering. First, the average expression level of each gene was calculated for each cluster. Genes were then sorted based on variance and the top 2000 genes were used to calculate a correlation-based distance matrix, Dxy=1-(cor(x,y))/2, between each cluster average. A cluster tree was generated by performing hierarchical clustering on this distance matrix (using “hclust” with default parameters), and then reordered to show inhibitory clusters first, followed by excitatory clusters and glia, with larger clusters first, while respecting the tree structure. Note that this measure of cluster similarity is complementary to the co-clustering similarity described above. For example, two clusters with high transcriptomic similarity but a few distinct marker genes may have low co-clustering similarity.

Marker gene selection

Initial sets of marker genes for each pair of clusters were selected by assessing significance of differential expression using the “limma” (Ritchie et al., 2015) R package, and then filtering these sets of significant genes to include only those expressed in more than 50% of nuclei in the “on” cluster and fewer than 20% of nuclei in the “off” cluster. Potential marker genes for individual clusters were chosen by ranking the significance of pairwise marker genes, summing the ranks across all possible pairs for a given cluster, and sorting the resulting gene list ascending by summed rank. The final set of marker genes was selected by comparing the gene expression distribution for the top ranked marker genes for each cluster using the visualization described below.

Cluster matching

Gene expression visualization

Gene expression (CPM) was visualized using heat maps and violin plots, which both show genes as rows and nuclei as columns, sorted by cluster. Heat maps display each nucleus as a short vertical bar, color-coded by expression level (blue=low; red=high), and clusters ordered as described above. The distribution of marker gene expression across nuclei in each cluster were represented as violin plots, which are density plots turned 90 degrees and reflected on the Y-axis. Black dots indicate the median gene expression in nuclei of a given cluster; dots above Y=0 indicate that a gene is expressed in more than half of the nuclei in that cluster.

Estimating nuclear proportions