Before and after figures

In Figure 1A we show PCA before COMBAT, and Figure 1B shows PCA after COMBAT. Before, the samples are ... After, the samples are ...

Discussion

COMBAT removed batch effects but did so by removing any genes that were expressed in less than 20% of cells. Thus, those interested in rare cell types that occur in less than a fifth of the data will find COMBAT to remove the signal of interest as well as the technical signal.

VAMF - by Yiran Hou

 "Varying-Censoring Aware Matrix Factorization for Single Cell RNA-Sequencing" \cite{Townes_2017} 
•      Paper DOI: https://doi.org/10.1101/166736
 •      GitHub link: https://github.com/singlecell-batches/vamf
 
Single-cell RNA-sequencing suffers from high drop-out rate in gene level detection, leading to a high number of zero counts in the digital gene expression matrix. These zeroes do not faithfully represent low expression levels of genes and may bias factor estimation during log-transformation. Also, variation in per-cell zero counts may lead to spurious cluster detection that reflects technical variances (Hicks. 2017). 
To model data censoring and per-cell zero-count variation at the same time, Townes et al. developed VAMF, Varying-Censoring Aware Matrix Factorization (Townes. 2017). They improved the censoring model developed in ZIFA, Zero-Inflated Factor Analysis (Pierson. 2015) by adding parameters accounting for cell-cell variation in censoring. 
We ran VAMF in parallel with PCA on the 10% cell subset and compared their performance on cell separation \cite{Townes_2017}. Genes detected in 10 cells or below were removed from the digital gene expression matrix, reducing the gene dimensions from 46,243 to 15,665.
 
We calculated top principal components and VAMF factors using default parameters. In both methods, top component best explains the discrepancy among the cells (PC1: 8.1%; VAMF dimension 1 dimension learning: 47.3%). The primary principal component strongly correlates with the detection rate in cells (Spearman r = -0.99), while the primary VAMF factor correlates not as strongly (Figure; Spearman r = 0.65). Although the second VAMF factor is strongly correlated with the detection rate (Spearman r = -0.93), the percentage of dimension learning by this is relatively negligible (VAMF dimension 2: 15.9%). This suggests that VAMF is not biased by detection rate as PCA, reconfirming VAMF's performance. However, since the detection-rate distributions in the batches from our dataset are similar (Figure), VAMF was not able to remove the batch effects in this dataset (Figure).