Dimensionality reduction
From the normalized expression matrix, we first identified a set of
variable genes with high dispersion rates across cells. Briefly, we
calculated the mean per gene in the non-log space, and dispersion was
calculated from dividing mean by variance. We selected 5,000 genes with
the highest dispersions as variable genes for dimensionality reduction,
a common step in single-cell data analysis for reducing noise and
capturing biological signals. Here we leveraged independent component
analysis (ICA), which was initially developed to separate a group of
mixed signals into additive sources that are independent of each other,
and has more recently been applied to dimensionality reduction for
single cell data (Trapnell et al., 2014). We implemented ICA using the
ica package in R (Helwig and Hong, 2013; Hyvarinen, 1999).
The returned ICs contain pooled information across multiple correlated
genes, and thus represented ‘meta-genes’ (Setty et al., 2016) which were
robust to drop-out events in single-cell RNA-seq data. We noticed that
the variance accounted for by each component fell after IC25, whereas GO
term enrichment using Enrichr (Chen et al., 2013; Kuleshov et al., 2016)
showed no significant enrichment after IC20. Furthermore, genes with
strong IC14 loadings were dominated by mitochondrial genes, and we
therefore used ICs 1 to 20 (excluding IC14) for downstream analysis.