1. INTRODUCTION
Our planet is losing biodiversity at an unprecedented rate, and it is
urgent today to map total biodiversity on Earth in order to assess how
biodiversity is affected by global climate change. The ocean contains
97% of all water on our planet and is thus a fundamental biodiversity
reservoir and driver of global ecology. Marine plankton form the base of
ocean food webs and play a major role in the planet’s global
biogeochemistry balance by accounting for almost half of the net primary
production (Falkowski et al., 2008; Field et al., 1998), and thus drive
ocean oxygen production and the biological carbon pump (Guidi et al.,
2016). However, global ocean physics and chemistry are changing rapidly
and it is expected that plankton diversity and geographic distribution
will be fundamentally altered in the coming decades (Ibarbalz et al.,
2019).
Ever since the first large scale DNA sequencing survey of marine
plankton undertaken by the Global Ocean Sampling expedition in 2007
(Rusch et al., 2007), other planetary-scale expeditions have deployed
holistic sampling protocols to assess ocean ecosystems. Importantly, the
latter have measured the in situ biogeochemical parameters that
provide the environmental context necessary for ecological
interpretation of plankton communities. One such international
endeavour, Tara Oceans 2009-2013 (Karsenti et al., 2011) sampled
viruses to zooplancton using a standardized pan-ecosystemic protocol at
210 globally distributed stations and three depths down to 1,000 m. The
Malaspina-2010 (2010-2011) global circumnavigation expedition (Duarte,
2015) applied a similar approach with a particular emphasis in sampling
the dark meso- and bathy-pelagic tropical and subtropical waters from
surface down to 4,000 m depth.
During the same decade, rapid progress in high-throughput DNA sequencing
technology (HTS) has led to a thorough re-assessment of biodiversity in
ecosystems and biomes. In particular, deep sequencing of environmental
DNA or RNA amplicons can now reveal prokaryotic and eukaryotic
biological diversity close to saturation in even the richest samples
(Geisen et al., 2019). Such a metabarcode approach has provided
comprehensive surveys of biological communities contained in plankton
samples collected during the Tara Oceans and Malaspina
expeditions. The resulting ocean metabarcodes have allowed a
re-evaluation of eukaryotic diversity (de Vargas et al., 2015), a global
description of plankton biogeography (Richter et al., 2019), and
insights into key plankton players in carbon export (Guidi et al.,
2016).
However, the Terabyte magnitude and complexity of these new datasets
restrict their access to specialized bioinformatics teams, leaving a
large majority of researchers interested in plankton diversity high and
dry. Apart from the sheer volume of sequencing reads, their clustering
and annotation as well as their connection to environmental data,
contribute to rendering this precious data underexploited by biological
oceanographers. The simple ergonomic tools to access and extract
biological meaningful information that were developed for marine gene
catalogs derived from metagenomes and metatranscriptomes (Villar et al.,
2018) have so far been lacking for metabarcode datasets. The Ocean
Barcode Atlas (OBA) has been developed to assist ocean researchers
without specific bioinformatics expertise to easily explore metabarcodes
(metaB) of interest across the global ocean ecosystem using nothing else
than a web browser. Robust quantitative and contextualized analyses are
carried out on the fly within minutes, compared to the several hours
(more frequently days) of specialized bioinformatics computation on
dedicated high-performance hardware that are required without such a web
service. The OBA service (http://tara-oceans.mio.osupytheas.fr/) is
independent but complementary to the previously described Ocean Gene
Atlas (OGA, http://tara-oceans.mio.osupytheas.fr/ocean-gene-atlas/;
Villar et al., 2018). Indeed, the OBA reported here relies on
metabarcode sequences, and as such allows users to explore plankton
biodiversity from a taxonomic perspective, providing answers such as
“how is a specific plankton taxon distributed across the oceans?”. The
previously published OGA, being based on metagenomic sequences, is
designed to explore the biogeography of plankton gene functions,
enabling users to answer questions such as “where in the marine biome
are genes related to anaerobic ammonium oxidation to be found?”.
The initial version of the OBA currently integrates three large
metabarcode datasets: i) the Tara Oceans 18S-V9 rRNA metaB (de
Vargas et al., 2015; Ibarbalz et al., 2019), ii) the Tara Oceans
16S/18S rRNA miTags (Logares et al., 2014; Salazar et
al., 2019) and iii) the Malaspina-2010 16S-V4V5 rRNA metaB (Salazar et
al., 2015).