Discussion
Ancient DNA from lake sediments constitutes a valuable resource to
investigate the response of populations to past environmental changes.
Previous studies using metabarcoding or shotgun sequencing have not yet
explored the full potential of this resource. Here, we applied shotgun
sequencing and hybridization capture using PCR-generated baits of theLarix chloroplast genome, to retrieve nearly complete chloroplast
genomes and study past changes in the population history of larches in
northern Siberia.
Taxonomic classification - majority of shotgun data could
not be
assigned
In the shotgun dataset, only 0.3% of quality filtered reads could be
classified against the nt database. Considering that fewer than 15,000
organisms of the 10 to 15 million of eukaryotic and perhaps trillions of
bacterial and archaeal species
have
currently been completely or partially sequenced
(Lewin et al., 2018), this is a
comparatively large fraction. However, other studies have reported
higher rates of assigned sequences. Slon et al.
(2017),
assigned more than 10% of their shotgun reads that derived from cave
sediments and Ahmed et al.
(2018)
reported 16% of assigned sequences from ancient lake sediments and 70%
of hits over multiple taxa. On the other hand, Pedersen et al.
(2016)
assigned 0.3% of the quality filtered reads obtained from ancient lake
sediments, and Parducci et al.
(2019)
reported a read assignment of 3.4% of sed aDNA shotgun
sequencing. Different sample material, wet lab processing and sequence
databases used for assignment make it difficult to properly compare the
assignment rate. In addition to these differences, the applied
bioinformatic approach has an impact on the results
(Harbert,
2018).
Here we used kraken2 (Wood, Lu, & Langmead,
2019),
a new version of kraken, which is a particularly conservative tool
compared to others, reporting less false-positive but also less
true-positive hits than others tools, even with default values
(Harbert,
2018).
We used it with a high confidence threshold as we found it gives the
best results in terms of vegetation composition based on our knowledge
of the vegetation history
(Epp
et al., 2018), but with the consequence of very low overall assignment
rates. Indeed, when we use the default values of kraken2, we could
assign 10–16% of the reads. However, more lenient classification
causes a reinforcement of the database bias: few deeply sequenced taxa
are more likely to be assigned than the majority of shallowly or
fragmentarily sequenced taxa
(Parducci
et al., 2017).
Capture success - Viridiplantae reads increased by orders of
magnitude
Hybridization capture resulted in an increase of taxonomically
classified reads by orders of magnitude – especially with respect to
the ratio of unclassified reads but also in total. These results show
for the first time that DNA capturing of whole chloroplast genomes is
effective even for DNA libraries that constitute a high diversity and
low on-target rates such as DNA from ancient lake sediments. Along withLarix many other plant taxa were enriched, as is shown by the
high number of Viridiplantae classified reads (Figure 1). This can be
explained by: (1) chloroplast genomes of land plants and green algae
sharing essentially the same set of protein-coding genes and ribosomal
RNAs and differing mainly in the presence/absence of introns and repeats
(B.
R. Green,
2011),
and (2) DNA libraries built from lake sediments contain a very high
sequence divergence, higher than those from pooled individuals, which in
the past were used to measure the capability of capturing highly
divergent reads
(Paijmans,
Fickel, Courtiol, Hofreiter, & Förster, 2016; Peñalba et al., 2014). We
hypothesize that the captured reads classified as other plants share, to
some extent, sequence similarities with conserved regions like genes or
ribosomal RNA.
The comparison of captured Larix- classified reads versus the full
capture samples gives a first hint of this: if we align the complete
dataset and the subset with the same relaxed parameters to theLarix chloroplast genome, the coverages are unequally distributed
between the different functional regions. The conserved protein coding
genes, especially genes coding for the photosystem complex, transfer RNA
and ribosomal RNA, have coverages significantly higher in the complete
dataset than in the Larix- classified dataset. In the case of
ribosomal RNA the difference is even orders of magnitude higher (Figure
2). This shows that sequences from species other than Larix can
align to these especially conserved regions. Several studies already
demonstrate enrichment of diverged sequences using hybridization
capture.
Peñalba
et al.
(2014)
show the applicability of hybridization capture to analyse taxa from 13
families with up to 27% sequence divergence, by targeting conserved
gene regions. On ancient material, Paijmans et al.
(2016)
captured mitochondrial sequences with up to 20 million years divergence
time with up to 25% sequence dissimilarity. With a modified relaxed
double capture method, Li et al.
(2013)
demonstrate a successful enrichment of nuclear genes across distantly
related species with up to 300 million years divergence time (with up to
40% sequence dissimilarity). All these studies demonstrate that
hybridization capture can be used to study a wide range of
phylogenetically divergent taxa, although they all note a decrease in
coverage with increasing evolutionary distance.
Near-complete retrieval of Larixchloroplast
genomes
The number of reads classified as the genus Larix, using the nt
database increased 800 to 1600-fold from shotgun to hybridization
capture dataset. However, in all samples there was a high level of PCR
duplicates due to the high number of amplifications after capturing and
possibly also due to oversequencing of reads. Future projects could
reduce the number of PCR cycles and increase the number of samples
pooled together to reduce duplicates.
The deduplication of reads prior to mapping against the reference genome
reduced the enrichment to 6–16-fold. This is in the range of results
from enrichment studies performed on bone material (Avila-Arcos 2015,
Carpenter 2013), although higher enrichment ranges have also been
reported (Mohandesan 2017). Here, most likely the sample material plays
a role, as the lake sediments contained a low content of the target
larch chloroplast DNA together with a high sequence diversity.
Nevertheless, the target enrichment rate could potentially be increased,
for example, by capturing at higher temperatures or using a touch-down
approach or by using RNA instead of DNA for baits
(Carpenter
et al., 2013; Paijmans et al., 2016; Peñalba et al., 2014).
In the capture dataset, near-complete chloroplast sequences could be
retrieved from the three oldest samples. But even the sample with the
highest coverage, i.e. the oldest from 6700 cal-BP, contained gaps in
the alignment (91% covered at 1-fold coverage). When comparing the
coverage of different functional groups of the genome (Figure 2), we see
the lowest coverage for ribosomal RNA, followed by protein coding genes.
This is due to the bioinformatics approach of extracting reads as
classified to the lowest common ancestor in kraken2. As these coding
regions are highly conserved across taxa
(B.
R. Green,
2011),
the short reads can also be attributed to other organisms and are
consequently classified to a higher taxonomic rank than Larix.Accordingly, when assigning the full, not taxonomically binned samples
in the same manner to functional groups, this trend reverses.
Analysis of DNA damage patterns revealed C to T substitution rates
typical for ancient DNA (Supplementary Figures S4, S5). Typical for the
preparation of single-stranded libraries, these substitutions could be
observed both at the 3’ and the 5’ end of the molecules
(Gansauge
& Meyer, 2013). C to T substitution rates increased with sample age, in
line with previous observation
(Pedersen
et al., 2016; Sawyer, Krause, Guschanski, Savolainen, & Pääbo, 2012).
Read lengths ranged from 40 bp to 340 bp, showing the short fragment
length typical for ancient DNA
(Green
et al., 2008).
L. sibirica variants present over
time
When comparing the ancient reads to chloroplast reference genomes fromL. gmelinii and L. sibirica , the great majority of reads
carry the L. gmelinii variants with a low frequency of L.
sibirica variants in all four samples. In contrast, the analysis
of one mitochondrial marker derived from the same core by Epp et
al .
(2018),
showed a mixture of both species, with relatively high rates of L.
sibirica – except for the most recent sample, which showed clear
dominance of the L. gmelinii mitotype – pointing to a
co-occurrence of both species across most of the sediment core. In the
genus Larix, chloroplasts are predominantly inherited paternally
(Szmidt,
Aldén, & Hällgren, 1987) whereas mitochondrial DNA is inherited
maternally
(DeVerno,
Charest, & Bonen, 1993), a phenomenon which has been reported for
almost all members of the conifers
(Neale
& Wheeler, 2019). This bi-parental inheritance results in different
rates of gene flow and subsequently asymmetric introgression patterns
(Du,
Petit, & Liu, 2009; Petit et al., 2005). Simulations
(Currat,
Ruedi, Petit, & Excoffier, 2008) and molecular studies on a range of
Pinaceae
(Du
et
al., 2011, 2009; Godbout, Yeh, & Bousquet, 2012) showed that the
seed-transmitted mitochondria which experience little gene flow
introgress more rapidly than the pollen dispersed chloroplasts which
experience high gene flow. A second finding of these studies is that
introgression occurs asymmetrically from the resident species into the
invading species. An expected result of introgression is therefore a
population carrying mitotypes of the former local species and
chlorotypes of the invaded species.
In the case of the population history of L. sibirica and L.
gmelinii in their contact zone, Semerikov et al.
(2013)
found evidence for the asymmetric introgression of L. sibiricamitotypes in a population carrying only L. gmeliniichlorotypes, confirming the natural invasion of L.
gmelinii into the range of L. sibirica. Here we corroborate
these findings with a distinct discrepancy between relatively high rates
of L. sibirica mitotypes as reported before
(Epp
et al.,
2018)
and low rates of L. sibirica in the chloroplast reads found in
this study.
This points to an invasion of L. gmelinii in a former population
of L. sibirica prior to the date of our oldest sample (6700
cal-BP). Further evidence in support of this scenario is found in the
results from a lake sediment core 250 kilometres south-west of the study
site
(Epp
et al., 2018), where samples reaching back to 9300 cal-BP show
exclusively L. sibirica mitotypes, before they were gradually
replaced by the L. gmelinii mitotype.
Our study shows that by capturing for the complete chloroplast genome,
we achieve a high resolution and can detect species specific variants
even at low frequencies. Further studies should also include
mitochondrial sequences in the target enrichment to collect data from
several markers or potentially the complete mitochondrial genome. By
combining the two organelle genomes in a hybridization capture
experiment it would be possible to study hybridization and introgression
events in detail, which would help to deepen our understanding of
population dynamics over long time scales.
Larch forest decline over the last 7000
years
When looking at the overall retrieval of Larix reads among the
samples, most reads could be recovered from the oldest sample, of around
6700 cal-BP and the least number of reads in the most recent sample.
This is in line with the findings of Klemm et al .
(2016)
and Epp et al .
(2018),
who describe a general vegetation turnover from larch forest to an open
tundra with only sparse Larix stands during the last 7000 years
at this site. This gradual change of vegetation in response to
late-Holocene climate deterioration has also been observed by various
studies
(Andreev
et al., 2004; MacDonald, Gervais, Snyder, Tarasov, & Borisova, 2000;
MacDonald, Kremenetski, & Beilman, 2008; Pisaric, MacDonald, Velichko,
& Cwynar,
2001)
and is in correspondence with the reconstructed global cooling trend of
the middle to late Holocene
(Marcott,
Shakun, Clark, & Mix,
2013).