loading page

A priori estimation of sequencing effort in complex microbial metatranscriptomes
  • Antonio Monleon-Getino,
  • Jorge Frias-Lopez
Antonio Monleon-Getino
University of Barcelona
Author Profile
Jorge Frias-Lopez
University of Florida
Author Profile

Abstract

1. Accurate differential expression of microbial metatranscriptomes based on Next Generation Sequencing depends partly on the depth of the libraries used to perform the analysis. Therefore, estimating the sequencing depth required to sample the metatranscriptome of interest using RNA-seq effectively is an essential first step to both obtain robust results in further analysis and avoiding over-expending once the information contained in the library reaches saturation. 2. Here we present a method to calculate the effort in saturation curves and a priori genes prediction using a simulated series of metatranscriptomic/metagenomic matrices. This method is based on the extrapolation rarefaction curve using a Weibull growth model to estimate the maximum number of genes/OTUs as a function of sequencing depth using a machine learning approach. This approach allows us to compute the effort at different confidence intervals and to obtain an approximate a priori effort using based on an initial fraction of sequences. 3. The accuracy of the results obtained with simulations and real samples (15 datasets of metatranscriptomes from the oral cavity, RNA sequences consist of vectors of 105-1.5x107 reads depth with a 10000 and 600000 genes size) allows one to use an initial shallowly sequenced sample (in this case 20% of the total amount of reads sampled; accuracy R2>0.99 simulated samples and 60-93% for real samples) to estimate the expected sequencing effort needed to cover the whole metatranscriptome/ metagenome from the same sample, so can be used to estimate the estimate the sample size. The algorithm containing the proposed method was saved as a function for R. 4. This proposed method of estimation of the maximum number of gene/OTUs, reads to reach 90, 95 and 99% of maximum number of gene/OTUs, is efficient to help researchers to know if the sampling is sufficient or otherwise need to be increased.

Peer review status:ACCEPTED

30 Apr 2020Submitted to Ecology and Evolution
01 May 2020Submission Checks Completed
01 May 2020Assigned to Editor
04 May 2020Reviewer(s) Assigned
06 Jun 2020Review(s) Completed, Editorial Evaluation Pending
08 Jun 2020Editorial Decision: Revise Minor
24 Jul 20201st Revision Received
25 Jul 2020Submission Checks Completed
25 Jul 2020Assigned to Editor
25 Jul 2020Review(s) Completed, Editorial Evaluation Pending
31 Jul 2020Reviewer(s) Assigned
22 Sep 2020Editorial Decision: Accept