Methods
Sample material
Samples were obtained from a sediment core from lake CH12 (72.399°N,
102.289°E, 60 m a.s.l.) in the Khatanga region of the northern Siberian
lowlands, located between the Taymyr Peninsula to the north and the
Putorana Plateau to the south. The lake’s position is in the northern
part of the treeline ecotone and is currently surrounded by a vegetation
of single-tree tundra. Details of the chronology of the core are
described in Klemm et al.
(2016).
Four samples were chosen for the present study at depths/ages 121.5
cm/~6700 calibrated years before present (cal-BP), 87.5
cm/~5400 cal-BP, 46.5 cm/~ 1900 cal-BP
and 2.5 cm/~60 cal-BP.
Laboratory work
Sampling, DNA extraction and library
preparation
Core sub-sampling and DNA extraction were performed as described in Epp
et al.
(2018).
Five µL of each DNA extraction were used in the library preparation. The
extractions had DNA concentrations of 2.4 ng/µL (6700 cal-BP), 5.7 ng/µL
(5400 cal-BP), 3.4 ng/µL (1900 cal-BP) and 6.7 ng/µL (60 cal-BP). In
total, six libraries with volumes of 50 µL each were prepared: four
samples from the CH12 core, one extraction blank and one library blank.
Libraries were prepared following the single stranded DNA library
preparation protocol of Gansauge et al.
(2017)
with the following adjustment: the ligation of the second adapter
(CL53/CL73) took place in a rotating incubator. The libraries were
quantified with qPCR as described by
Gansauge
and Meyer (2013). The prepared libraries were used downstream for both
shotgun sequencing and hybridization capture of chloroplast genomes.
Shotgun sequencing
Twenty-four µL of the prepared DNA library were amplified and indexed by
PCR with 13 cycles as described in
Gansauge
and Meyer
(2013)
using the index primer sequences P5_1 – P5_6 and P7_91 – P7_96.
The libraries were pooled in equimolar ratios to a final pool of 10 nM
with the two blanks accounting for a molarity of 20% compared to the
samples. The sequencing of the pool was performed by Fasteris SA
sequencing service (Geneva, Switzerland) using a modified sequencing
primer CL72 as described in
Gansauge
and Meyer
(2013).
The pool was sequenced on one lane of an Illumina HiSeq 2500 platform (2
x 125 base pairs (bp), High-Output V4 mode).
Bait construction
Eighteen primer pairs for long range PCR products covering the whole
chloroplast genome of L. gmelinii were designed as published in
Zimmermann
et al. (2019). Long-range PCR products were generated from DNA
extractions from needles of a L. gmelinii individual collected in
the Botanical Gardens of the University of Potsdam (Accession number
XX-0POTSD-3867). PCR amplification was conducted using the SequalPrep™
Long PCR Kit (Invitrogen), according to the manufacturer’s cycling
protocol instructions, and with specific annealing temperatures for each
primer pair (see Zimmermann et al. 2019, Table S2). The PCR products
were pooled in equimolar ratios in a volume of 130 µL and sonicated
using a Covaris M220 Focused-ultrasonicator (Covaris, Woburn, MA) to a
target peak of 350 bp with settings of peak incident power 50 W, duty
factor 20%, cycles per burst 200, treatment time 70 seconds. The
fragment size and distribution were visualized with Agilent TapeStation
(D1000 ScreenTape, Agilent Technologies, Santa Clara, CA). Fragment
sizes ranged from 100 bp to 1000 bp with an average size of 370 bp. The
complete sonicated pool was purified using the MinElute PCR Purification
Kit (Qiagen, Hilden, Germany), following the manufacturer’s
recommendations and eluted in 30 µL.
Hybridization capture
Another 24 µL of prepared DNA libraries was PCR amplified with 16 cycles
and equimolarly pooled with the blanks having a molarity of 20%
compared to the samples. The enrichment was done according to the
protocol of Maricic et al. (2010) with the following adjustments: the
purified library pool was blunt ended using Fast DNA End Repair Kit
(Thermo Scientific) in a volume of 100 µL according to the
manufacturer’s recommendations. The adapters were ligated to the baits
using the Rapid DNA Ligation Kit (Thermo Scientific) with the following
conditions: 15 µL blunt-ended DNA, 1 µL of adapters Bio-T/B (50 µM), 8
µL Quick ligase buffer, 12 µL water and 4 µL quick ligase were mixed and
incubated for 15 min at room temperature. In addition to the blocking
oligonucleotides provided in the original protocol
(Maricic
et al., 2010), we used specific oligonucleotides to block the indices
used in the double indexed library : BO7.P7.part2.F:
ATCTCGTATGCCGTCTTCTGCTTG-phosphate and BO8.P7.part2.R:
CAAGCAGAAGACGGCATACGA. Four µL of the final elution were PCR amplified
with 18 cycles using the TruSeq DNA Nano preparation kit (Illumina Inc.,
San Diego, CA), performed by Fasteris SA sequencing service (Geneva,
Switzerland). The enriched library pool was sequenced by Fasteris SA
sequencing service (Geneva, Switzerland) in the same way as described
for shotgun sequencing.
Data analysis
Quality control, trimming and merging of
reads
Demultiplexed FASTQ files, as obtained from the sequencing provider,
were quality checked using FASTQC
(v.
0.7.11,
Andrews, 2015) before and after trimming with Trimmomatic
(v.
0.35,
Bolger, Lohse, & Usadel, 2014). The analyses performed by Trimmomatic
relied on a file containing the applied and common Illumina adapters,
and the following parameter settings were used: remove adapters with
max. mismatch rate = 2, sliding window, window size = 4, average quality
= 15, minimum quality to keep a base = 3, minimum length to keep a
sequence = 36 nt. Unpaired reads were discarded. Paired reads were
merged using PEAR
(v.
0.9.10,
Zhang, Kobert, Flouri, & Stamatakis, 2014). Merged and unmerged reads
were treated in the following steps separately and merged at the end
into BAM files with samtools using the command “merge”
(v.
1.9,
H. Li et al., 2009).
Taxonomical
classification
Reads were classified using kraken2,
(v.
2.0.7-beta,
Wood, Lu, & Langmead, 2019) with a conservative confidence threshold
(–confidence 0.9) against the non-redundant nucleotide database (nt)
from NCBI
(ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/nt.gz;
downloaded in May 2019) and the NCBI taxonomy (retrieved via the
kraken2-build command in June 2019). This classification was used for
the description and comparison of the shotgun as well as the capture
dataset. Additionally, the reads were classified using a custom database
of 8,132 complete plant chloroplast genomes (downloaded from NCBI in
July 2019). In this second classification kraken2 was run with default
parameters to allow for the retrieval of variation. Reads classified
with the chloroplast database as genus Larix or below were
extracted using custom bash scripts and seqtk
(Seqtk,
n.d.) and used for further analysis.
Alignment
Capture dataset reads classified as Larix by comparison to a
custom chloroplast database were mapped against a L. gmeliniireference genome from an individual in the Taymyr region (NCBI accession
no.: MK468637.1). Reads were mapped using BWA aln algorithm
(v.
0.7.17-r1188,
Li & Durbin, 2009) with the settings -l 1024 -o 2 -n 0.001 to ensure
relaxed mapping. Duplicate reads were removed using Picard
MarkDuplicates
(v.
2.20.2-SNAPSHOT,
Broad Institute, 2019) for merged reads and samtools markdup for
unmerged reads. The same alignment parameters were used to map the
complete, taxonomically unclassified samples to the same L.
gmelinii chloroplast reference genome.
Coverage of the Larix chloroplast
genome at different annotated
functions
The annotation of the chloroplast genome was adopted from the published
annotation file on NCBI for the used reference genome (Acc.:
MK468637.1). Capture dataset alignments of the Larix -classified
reads and the complete samples against the Larix chloroplast
genome were used for the analysis of different functional groups.
Coverages for each sample were obtained using samtools depth (option
–a)
(Li
et al., 2009) and summed up at each position across the samples.
Boxplots were made using R with ggplot2
(R
Core Team, 2013).
Assessment of ancient DNA damage
patterns
Ancient damage patterns were assessed using mapDamage
(v.
2.0.8,
Jónsson, Ginolhac, Schubert, Johnson, & Orlando, 2013) rescaling the
quality scores of the BAM files with the single-stranded option. For
damage pattern assessment of unmerged paired-end reads, the reads were
mapped in two rounds as single-end reads to the reference (Acc.:
MK468637.1) and analysed with mapDamage to not confound the 5’ and 3’
damage patterns. Read length distribution were assessed with mapDamage
for overlapping merged reads and with Geneious
(v.
2019.2, Biomatters, 2019).
Assignment of Larix bases toL . sibirica or L .gmelinii
To distinguish between the two Larix species L. gmeliniiand L. sibirica and determine their temporal occurrence, we
performed multiple comparisons of the 13 complete chloroplast genomes ofL. gmelinii (Accessions MK468630–39, MK468646 and -48,
NC_044421) available on NCBI and one of the two available L.
sibirica genomes (Accession: MF795085.1) and classified
species-specific single nucleotide polymorphisms (SNPs) and insertions
and deletions (indels). The alignment was performed using bwa mem with
default parameters. SNPs and indels were called using bcftools mpileup
(option
-B)
and call (option -mv)
(v.
1.9,
H. Li et al., 2009). In total, 294 positions were determined as
occurring only in L. sibirica . For each of these positions the
above-produced alignments (ancient sample reads against L.
gmelinii reference) were analysed with regards to whether the reads
carried the same variant as L. gmelinii or L. sibirica or
if they had a different SNP or indel at this position. The analysis was
carried out using bam-readcount
(The
McDonnell Genome Institute, 2018) in conjunction with a custom python
script which for every position classified the count of reads asL. gmelinii, L. sibirica or “other”.