Determining unbiased marker sets
In the previous analysis, we examined the expression of previously
identified markers in our single cell clusters. Alternatively, we can
also identify markers that define our single cell states unbiasedly. We
defined an unbiased set of markers using a likelihood ratio test that is
specifically designed for zero-inflated data (McDavid et al., 2013), and
that we have previously applied to Drop-seq (Macosko et al., 2015). This
test was run on the ‘normalized’ expression data, and we present this
list of markers in Table S2. As a non-parametric alternative, we can
also identify genes that are up-regulated in each cluster based on the
‘corrected’ expression levels (after latent variable regression). Here,
we average the scaled residuals after negative binomial regression for
all genes within each cluster, and select the genes with the highest
average score as cluster markers, after removing ribosomal and
mitochondrial genes. Though this is not based on a statistical test, we
found that these marker sets were more informative, as they were
performed on the corrected data. We report the top 100 markers for each
cluster in a separate tab on Table S2.