One of the most common experiments in plant biology research is to compare the gene expression profiles of wild type and mutant Arabidopsis using RNA-seq.  However, mutant strains may develop at different rates and have substantial phenotypic differences from wild type.  For an RNA-seq experiment to provide valuable new biological insights, it is necessary to identify gene expression changes that cannot be explained by differences in age or tissue ratio-- expression changes do nothing but confirm phenotypic observations.  In this manuscript, we present a new method called TissueTimer that categorises genes that are differentially expressed into (i) those that can be explained by differences in tissue-ratio between samples, (ii) those that can be explained by differences in developmental stages, and (iii) those that cannot be explained by either of these factors.  This will allow us to prioritise genes to investigate further in follow-up experiments.  TissueTimer can also evaluate whether pairs of samples have significantly different tissue ratios or ages.  We apply the method to a large number of pre-existing RNA-seq samples, demonstrating new biological insights.
TissueTimer is trained using a tissue- and developmental- gene expression atlas.  First, marker genes are identified using CIBERSORT (CITE), which successfully distinguish between tissues and ages, but that have stable expression between replicates-- these marker genes include known tissue and devopmental genes (Fig S2, Table S1).  Then, when we have a new RNA-seq sample, we can estimate the tissue ratio in the same way as CIBERSORT, using Support Vector Regression (SVR).  We predict the tissue ratio of the new sample at a discrete number of developmental stages, smooth the result with splines, and find the time point that minimises the sum of the residual error.    However, this strategy only gives us a single 'best' prediction of tissue ratio and age and does not indicate how confident we are the results.  To find the probability distribution of tissue ratios and ages, we use simulations of technical noise in RNA-seq to find the probability of