1. INTRODUCTION
Our planet is losing biodiversity at an unprecedented rate, and it is urgent today to map total biodiversity on Earth in order to assess how biodiversity is affected by global climate change. The ocean contains 97% of all water on our planet and is thus a fundamental biodiversity reservoir and driver of global ecology. Marine plankton form the base of ocean food webs and play a major role in the planet’s global biogeochemistry balance by accounting for almost half of the net primary production (Falkowski et al., 2008; Field et al., 1998), and thus drive ocean oxygen production and the biological carbon pump (Guidi et al., 2016). However, global ocean physics and chemistry are changing rapidly and it is expected that plankton diversity and geographic distribution will be fundamentally altered in the coming decades (Ibarbalz et al., 2019).
Ever since the first large scale DNA sequencing survey of marine plankton undertaken by the Global Ocean Sampling expedition in 2007 (Rusch et al., 2007), other planetary-scale expeditions have deployed holistic sampling protocols to assess ocean ecosystems. Importantly, the latter have measured the in situ biogeochemical parameters that provide the environmental context necessary for ecological interpretation of plankton communities. One such international endeavour, Tara Oceans 2009-2013 (Karsenti et al., 2011) sampled viruses to zooplancton using a standardized pan-ecosystemic protocol at 210 globally distributed stations and three depths down to 1,000 m. The Malaspina-2010 (2010-2011) global circumnavigation expedition (Duarte, 2015) applied a similar approach with a particular emphasis in sampling the dark meso- and bathy-pelagic tropical and subtropical waters from surface down to 4,000 m depth.
During the same decade, rapid progress in high-throughput DNA sequencing technology (HTS) has led to a thorough re-assessment of biodiversity in ecosystems and biomes. In particular, deep sequencing of environmental DNA or RNA amplicons can now reveal prokaryotic and eukaryotic biological diversity close to saturation in even the richest samples (Geisen et al., 2019). Such a metabarcode approach has provided comprehensive surveys of biological communities contained in plankton samples collected during the Tara Oceans and Malaspina expeditions. The resulting ocean metabarcodes have allowed a re-evaluation of eukaryotic diversity (de Vargas et al., 2015), a global description of plankton biogeography (Richter et al., 2019), and insights into key plankton players in carbon export (Guidi et al., 2016).
However, the Terabyte magnitude and complexity of these new datasets restrict their access to specialized bioinformatics teams, leaving a large majority of researchers interested in plankton diversity high and dry. Apart from the sheer volume of sequencing reads, their clustering and annotation as well as their connection to environmental data, contribute to rendering this precious data underexploited by biological oceanographers. The simple ergonomic tools to access and extract biological meaningful information that were developed for marine gene catalogs derived from metagenomes and metatranscriptomes (Villar et al., 2018) have so far been lacking for metabarcode datasets. The Ocean Barcode Atlas (OBA) has been developed to assist ocean researchers without specific bioinformatics expertise to easily explore metabarcodes (metaB) of interest across the global ocean ecosystem using nothing else than a web browser. Robust quantitative and contextualized analyses are carried out on the fly within minutes, compared to the several hours (more frequently days) of specialized bioinformatics computation on dedicated high-performance hardware that are required without such a web service. The OBA service (http://tara-oceans.mio.osupytheas.fr/) is independent but complementary to the previously described Ocean Gene Atlas (OGA, http://tara-oceans.mio.osupytheas.fr/ocean-gene-atlas/; Villar et al., 2018). Indeed, the OBA reported here relies on metabarcode sequences, and as such allows users to explore plankton biodiversity from a taxonomic perspective, providing answers such as “how is a specific plankton taxon distributed across the oceans?”. The previously published OGA, being based on metagenomic sequences, is designed to explore the biogeography of plankton gene functions, enabling users to answer questions such as “where in the marine biome are genes related to anaerobic ammonium oxidation to be found?”.
The initial version of the OBA currently integrates three large metabarcode datasets: i) the Tara Oceans 18S-V9 rRNA metaB (de Vargas et al., 2015; Ibarbalz et al., 2019), ii) the Tara Oceans 16S/18S rRNA miTags (Logares et al., 2014; Salazar et al., 2019) and iii) the Malaspina-2010 16S-V4V5 rRNA metaB (Salazar et al., 2015).