Determining unbiased marker sets
In the previous analysis, we examined the expression of previously identified markers in our single cell clusters. Alternatively, we can also identify markers that define our single cell states unbiasedly. We defined an unbiased set of markers using a likelihood ratio test that is specifically designed for zero-inflated data (McDavid et al., 2013), and that we have previously applied to Drop-seq (Macosko et al., 2015). This test was run on the ‘normalized’ expression data, and we present this list of markers in Table S2. As a non-parametric alternative, we can also identify genes that are up-regulated in each cluster based on the ‘corrected’ expression levels (after latent variable regression). Here, we average the scaled residuals after negative binomial regression for all genes within each cluster, and select the genes with the highest average score as cluster markers, after removing ribosomal and mitochondrial genes. Though this is not based on a statistical test, we found that these marker sets were more informative, as they were performed on the corrected data. We report the top 100 markers for each cluster in a separate tab on Table S2.