1. Introduction
Over two-thirds of the Earth is covered by oceans, likely sheltering a high level of still poorly studied biodiversity, particularly in the deep sea (Costello & Chaudhary, 2017; Costello, Cheung, & De Hauwere, 2010). Today, molecular approaches provide nonintrusive methods to study the diversity of marine environments, even those that are hardly accessible to sampling. The analysis of environmental DNA (eDNA; Taberlet, Coissac, Hajibabaei, & Rieseberg, 2012) represents a promising path to inventory biodiversity and sets the ground for the development of molecular biomonitoring protocols (Andruszkiewicz et al., 2017; Apothéloz-Perret-Gentil et al., 2017; Bohan et al., 2017; Cordier et al., 2018; Derocles et al., 2018). Approaches based on eDNA (air, ground, sediment, or water, relatively easy to access and sample) target the genetic material present in the environment (Bohmann et al., 2014; Thomsen & Willerslev, 2015), allowing us to unravel the nature of macro- and microorganisms present in the surrounding habitats. First developed for the uncultivable majority that represents the microbial world (Xu, 2006), metabarcoding approaches, relying on PCR-based amplicon sequence identification combined with high-throughput sequencing, were transferred to eukaryotes early on (Creer et al., 2010; Hajibabaei, Shokralla, Zhou, Singer, & Baird, 2011; Taberlet et al., 2012; Valentini, Pompanon, & Taberlet, 2009).
Over the last decade, metabarcoding protocols have been improved, from sampling up to bioinformatic steps, to optimize their resolution and interpretation. Nevertheless, biomonitoring and biodiversity inventory using metabarcoding are challenging (Miya et al., 2015; Yamamoto et al., 2017) for two main reasons. First, this method relies on PCR-based DNA enrichment, suffering biases due to unequal amplification across taxa (PCR bias) and artifacts (PCR errors prominent to sequence errors), leading to biased biodiversity inventories (Acinas, Sarma-Rupavtarm, Klepac-Ceraj, & Polz, 2005; Kanagawa, 2003; Sefc, Payne, & Sorenson, 2007; Smyth et al., 2010). Second, the evolution of high-throughput sequencing technologies available on the market led to higher yield but shorter sequencing fragments, limiting the use of metabarcoding to short fragments. Such short fragments (usually 150 to 450 base pairs) lead to a less reliable assignment of sequences to taxa and hamper the use of data produced for comprehensive phylogenetic reconstructions. This is particularly limiting in ecosystems where biodiversity is poorly described and reference databases contain large gaps, for which many unassigned sequences can correspond to existing undescribed biodiversity, yet teasing them apart from spurious sequences would require phylogenetic reconstruction. The limiting factor for taxonomic assignment of deep sea organisms is the general lack of sequence references in marine systems. Some major groups, such as nematodes, which are the most abundant and diverse benthic metazoan taxa, can rarely be identified genetically (Dell’Anno, Carugati, Corinaldesi, Riccioni, & Danovaro, 2015; Gambi & Danovaro, 2016). Thus, long, high-quality barcode libraries are needed to improve taxonomic identification in general, especially for poorly known groups.
Theoretically, direct metagenomic sequencing (such as shotgun sequencing) could solve these limitations, as these sequences can also be reconstructed from eDNA to obtain a comprehensive overview of the taxonomic diversity of the studied community, free of PCR bias (Porter & Hajibabaei, 2018) and allowing reliable phylogenies based on long fragments. However, the production of metagenomes is still extremely costly, leading to a dominance of prokaryotic sequences, and thede novo reconstruction of comprehensive metagenomes is highly time consuming; differentiating between biological differences and sequencing errors is hardly possible and highly limited by gaps in reference databases (Ghurye, Cepeda-Espinoza, & Pop, 2016; Quince, Walker, Simpson, Loman, & Segata, 2017).
As an intermediate, less expensive option, to avoid the two main limitations associated with metabarcoding, two other methods of DNA enrichment are available (Mamanova et al., 2010; Mertes et al., 2011): the molecular inversion probe (MIP) and capture by hybridization (CBH). CBH exists in two variations, ”on-array capture” on a solid microarray or ”in-solution capture”, which takes place within a fluid medium (Gasc, Peyretaillade, & Peyret, 2016). Here, we use the latter, which was first described by Gnirke et al. (2009) for human exome resequencing, whereby hybrid probes are designed to enrich genomic DNA. While the initial cost of this system is high, by multiplexing libraries, efficient sequencing of several samples (up to 96-well plates) has been shown to be highly efficient (Meyer & Kircher, 2010). Moreover, a diversity of probes (single-stranded sequences of DNA) designed in different locations of the target gene regions allows capturing a much wider diversity and recovering long fragments, thereby improving taxonomic assignment and allowing reasonable phylogenetic reconstruction (Denonfoux et al., 2013; Gasc & Peyret, 2018). Furthermore, a low concentration DNA template is sufficient, allowing this method to be successfully used in low biomass environments (such as air or deep-sea biomes) wherein generally lower DNA concentrations are obtained, as in deep oligotrophic aquifers (Ranchou-Peyruse et al., 2017). The first test using eDNA showed that a 100-fold lower concentration can be detected with CBH than with traditional methods (Seeber et al., 2019), while others mentioned reduced tractability for DNA with less than 0.1 ng of total gDNA (Wilcox et al., 2018). Testing within complex prokaryotic communities even allowed the detection of extremely rarely represented members (less than 0.0001%) (Gasc & Peyret, 2018). It has been suggested that the final success of this method depends strongly on the probes (Ribière et al., 2016) rather than on the initial biomass and DNA concentration.
Improved biodiversity assessments can thus be expected using CBH, (i) avoiding PCR steps, yet targeting a broader range of biodiversity in a single reaction by using a comprehensive and versatile set of probes and (ii) reconstructing long fragments for full barcode regions, allowing reliable phylogenetic positioning and reconstruction. In recent years, this methodology has proven to markedly improve microbial diversity inventories with precise taxonomic affiliation at the species level (Gasc & Peyret, 2018). CBH was also applied to recover full-length microbial eukaryotic cDNAs in complex environmental samples (Bragalini et al., 2014) and to directly capture long DNA fragments (Gasc & Peyret, 2017). Additionally, CBH using mitochondrial barcodes for inventories of metazoans in bulk or ethanol-preserved samples resulted in a highly accurate census of species (Gauthier et al., 2020; Shokralla et al., 2016), and similar results were obtained when testing the detection of a broad range of metazoans, including mammals, from aquatic and sediment eDNA samples (Seeber et al., 2019; Wilcox et al., 2018). Additionally, CBH represents a promising path for phylogenetic studies, as recently shown for butterflies (Kawahara et al., 2018).
With this study, we aimed to assess the potential of 16S and 18S rDNA enrichment by CBH coupled with high-throughput sequencing to explore the biodiversity of prokaryotes and eukaryotes, including metazoans, in the deep sea (~500-2800 m depth). We analyzed eDNA samples extracted from sediment to compare CBH with metabarcoding for the V4 16S rDNA region and the V1-V2 18S rDNA region.