3.2. Sequencing output and fragment length
CBH resulted in 2,231,139 to 12,799,906 reads per sample. After the first filtering and trimming steps, approximately 7 to 34% were identified as rDNA (general sequencing output given in Appendix 4). For MTB, a total of 288,094 to 1,278,400 sequences were obtained per sample, lading 2231 to 4133 OTUs for 16SV4 and 770 to 2169 for 18SV1V2 per sample. The CBH data were analyzed by two different pipelines, first direct use of the short fragments (hereafter referred to as CBH-short) with Kraken 2, using the unaligned reads with a mean length of 200 to 289 bp, which was shorter than the fragments of up to 450 bp obtained with metabarcoding. Second, EMIRGE was used to reconstruct “full barcodes” (hereafter named CBH-long) allowed the reconstruction of fragments of on average 731 near-full length markers per sample, reaching up to 1200 to 1450 bp for archaea, up to 1600 bp for bacteria and 1200 to 1900 bp for eukaryotes (Fig. 2). However, for a small number of taxa, 60 to 95% of sequences were lost.