Data filtering and screening
Preliminary cluster analyses revealed retinal contamination in a subset of our Aripo dataset brain samples. While opsins are expressed at low levels in the brain, the very high expression levels (>10,000 copies) in three samples pointed to retinal contamination. To deal with this issue, we devised a sample filtering and screening procedure to remove genes in which expression differences between samples were likely dominated by retinal contamination. Briefly, we first filtered genes with low expression, then we used contigs annotated as known retinal genes (Rhodopsin, red/green-sensitive opsins, blue-sensitive opsins) as seed contigs to identify other contamination-related transcripts based on high positive correlations of expression levels with seed genes. We calculated the gene-wise sum of correlations between candidate genes and seed genes and performed multiple hypothesis testing using a false discovery rate (FDR) controlling procedure. The nominal level of FDR was set to α=0.2 to remove presumptive contaminant contigs. Using this approach, we identified 1,559 contigs as presumptive retina-enriched genes (~ 3% of all contigs in our final assembly) which we removed from both datasets in all subsequent analyses (Table S1). More detailed descriptions of statistical procedures are in the Supplemental Methods.