Methods

Sample material

Samples were obtained from a sediment core from lake CH12 (72.399°N, 102.289°E, 60 m a.s.l.) in the Khatanga region of the northern Siberian lowlands, located between the Taymyr Peninsula to the north and the Putorana Plateau to the south. The lake’s position is in the northern part of the treeline ecotone and is currently surrounded by a vegetation of single-tree tundra. Details of the chronology of the core are described in Klemm et al. (2016). Four samples were chosen for the present study at depths/ages 121.5 cm/~6700 calibrated years before present (cal-BP), 87.5 cm/~5400 cal-BP, 46.5 cm/~ 1900 cal-BP and 2.5 cm/~60 cal-BP.

Laboratory work

Sampling, DNA extraction and library preparation

Core sub-sampling and DNA extraction were performed as described in Epp et al. (2018). Five µL of each DNA extraction were used in the library preparation. The extractions had DNA concentrations of 2.4 ng/µL (6700 cal-BP), 5.7 ng/µL (5400 cal-BP), 3.4 ng/µL (1900 cal-BP) and 6.7 ng/µL (60 cal-BP). In total, six libraries with volumes of 50 µL each were prepared: four samples from the CH12 core, one extraction blank and one library blank. Libraries were prepared following the single stranded DNA library preparation protocol of Gansauge et al. (2017) with the following adjustment: the ligation of the second adapter (CL53/CL73) took place in a rotating incubator. The libraries were quantified with qPCR as described by Gansauge and Meyer (2013). The prepared libraries were used downstream for both shotgun sequencing and hybridization capture of chloroplast genomes.
Shotgun sequencing
Twenty-four µL of the prepared DNA library were amplified and indexed by PCR with 13 cycles as described in Gansauge and Meyer (2013) using the index primer sequences P5_1 – P5_6 and P7_91 – P7_96. The libraries were pooled in equimolar ratios to a final pool of 10 nM with the two blanks accounting for a molarity of 20% compared to the samples. The sequencing of the pool was performed by Fasteris SA sequencing service (Geneva, Switzerland) using a modified sequencing primer CL72 as described in Gansauge and Meyer (2013). The pool was sequenced on one lane of an Illumina HiSeq 2500 platform (2 x 125 base pairs (bp), High-Output V4 mode).

Bait construction

Eighteen primer pairs for long range PCR products covering the whole chloroplast genome of L. gmelinii were designed as published in Zimmermann et al. (2019). Long-range PCR products were generated from DNA extractions from needles of a L. gmelinii individual collected in the Botanical Gardens of the University of Potsdam (Accession number XX-0POTSD-3867). PCR amplification was conducted using the SequalPrep™ Long PCR Kit (Invitrogen), according to the manufacturer’s cycling protocol instructions, and with specific annealing temperatures for each primer pair (see Zimmermann et al. 2019, Table S2). The PCR products were pooled in equimolar ratios in a volume of 130 µL and sonicated using a Covaris M220 Focused-ultrasonicator (Covaris, Woburn, MA) to a target peak of 350 bp with settings of peak incident power 50 W, duty factor 20%, cycles per burst 200, treatment time 70 seconds. The fragment size and distribution were visualized with Agilent TapeStation (D1000 ScreenTape, Agilent Technologies, Santa Clara, CA). Fragment sizes ranged from 100 bp to 1000 bp with an average size of 370 bp. The complete sonicated pool was purified using the MinElute PCR Purification Kit (Qiagen, Hilden, Germany), following the manufacturer’s recommendations and eluted in 30 µL.

Hybridization capture

Another 24 µL of prepared DNA libraries was PCR amplified with 16 cycles and equimolarly pooled with the blanks having a molarity of 20% compared to the samples. The enrichment was done according to the protocol of Maricic et al. (2010) with the following adjustments: the purified library pool was blunt ended using Fast DNA End Repair Kit (Thermo Scientific) in a volume of 100 µL according to the manufacturer’s recommendations. The adapters were ligated to the baits using the Rapid DNA Ligation Kit (Thermo Scientific) with the following conditions: 15 µL blunt-ended DNA, 1 µL of adapters Bio-T/B (50 µM), 8 µL Quick ligase buffer, 12 µL water and 4 µL quick ligase were mixed and incubated for 15 min at room temperature. In addition to the blocking oligonucleotides provided in the original protocol (Maricic et al., 2010), we used specific oligonucleotides to block the indices used in the double indexed library : BO7.P7.part2.F: ATCTCGTATGCCGTCTTCTGCTTG-phosphate and BO8.P7.part2.R: CAAGCAGAAGACGGCATACGA. Four µL of the final elution were PCR amplified with 18 cycles using the TruSeq DNA Nano preparation kit (Illumina Inc., San Diego, CA), performed by Fasteris SA sequencing service (Geneva, Switzerland). The enriched library pool was sequenced by Fasteris SA sequencing service (Geneva, Switzerland) in the same way as described for shotgun sequencing.

Data analysis

Quality control, trimming and merging of reads

Demultiplexed FASTQ files, as obtained from the sequencing provider, were quality checked using FASTQC (v. 0.7.11, Andrews, 2015) before and after trimming with Trimmomatic (v. 0.35, Bolger, Lohse, & Usadel, 2014). The analyses performed by Trimmomatic relied on a file containing the applied and common Illumina adapters, and the following parameter settings were used: remove adapters with max. mismatch rate = 2, sliding window, window size = 4, average quality = 15, minimum quality to keep a base = 3, minimum length to keep a sequence = 36 nt. Unpaired reads were discarded. Paired reads were merged using PEAR (v. 0.9.10, Zhang, Kobert, Flouri, & Stamatakis, 2014). Merged and unmerged reads were treated in the following steps separately and merged at the end into BAM files with samtools using the command “merge” (v. 1.9, H. Li et al., 2009).

Taxonomical classification

Reads were classified using kraken2, (v. 2.0.7-beta, Wood, Lu, & Langmead, 2019) with a conservative confidence threshold (–confidence 0.9) against the non-redundant nucleotide database (nt) from NCBI (ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/nt.gz; downloaded in May 2019) and the NCBI taxonomy (retrieved via the kraken2-build command in June 2019). This classification was used for the description and comparison of the shotgun as well as the capture dataset. Additionally, the reads were classified using a custom database of 8,132 complete plant chloroplast genomes (downloaded from NCBI in July 2019). In this second classification kraken2 was run with default parameters to allow for the retrieval of variation. Reads classified with the chloroplast database as genus Larix or below were extracted using custom bash scripts and seqtk (Seqtk, n.d.) and used for further analysis.

Alignment

Capture dataset reads classified as Larix by comparison to a custom chloroplast database were mapped against a L. gmeliniireference genome from an individual in the Taymyr region (NCBI accession no.: MK468637.1). Reads were mapped using BWA aln algorithm (v. 0.7.17-r1188, Li & Durbin, 2009) with the settings -l 1024 -o 2 -n 0.001 to ensure relaxed mapping. Duplicate reads were removed using Picard MarkDuplicates (v. 2.20.2-SNAPSHOT, Broad Institute, 2019) for merged reads and samtools markdup for unmerged reads. The same alignment parameters were used to map the complete, taxonomically unclassified samples to the same L. gmelinii chloroplast reference genome.

Coverage of the Larix chloroplast genome at different annotated functions

The annotation of the chloroplast genome was adopted from the published annotation file on NCBI for the used reference genome (Acc.: MK468637.1). Capture dataset alignments of the Larix -classified reads and the complete samples against the Larix chloroplast genome were used for the analysis of different functional groups. Coverages for each sample were obtained using samtools depth (option –a) (Li et al., 2009) and summed up at each position across the samples. Boxplots were made using R with ggplot2 (R Core Team, 2013).

Assessment of ancient DNA damage patterns

Ancient damage patterns were assessed using mapDamage (v. 2.0.8, Jónsson, Ginolhac, Schubert, Johnson, & Orlando, 2013) rescaling the quality scores of the BAM files with the single-stranded option. For damage pattern assessment of unmerged paired-end reads, the reads were mapped in two rounds as single-end reads to the reference (Acc.: MK468637.1) and analysed with mapDamage to not confound the 5’ and 3’ damage patterns. Read length distribution were assessed with mapDamage for overlapping merged reads and with Geneious (v. 2019.2, Biomatters, 2019).

Assignment of Larix bases toL . sibirica or L .gmelinii

To distinguish between the two Larix species L. gmeliniiand L. sibirica and determine their temporal occurrence, we performed multiple comparisons of the 13 complete chloroplast genomes ofL. gmelinii (Accessions MK468630–39, MK468646 and -48, NC_044421) available on NCBI and one of the two available L. sibirica genomes (Accession: MF795085.1) and classified species-specific single nucleotide polymorphisms (SNPs) and insertions and deletions (indels). The alignment was performed using bwa mem with default parameters. SNPs and indels were called using bcftools mpileup (option -B) and call (option -mv) (v. 1.9, H. Li et al., 2009). In total, 294 positions were determined as occurring only in L. sibirica . For each of these positions the above-produced alignments (ancient sample reads against L. gmelinii reference) were analysed with regards to whether the reads carried the same variant as L. gmelinii or L. sibirica or if they had a different SNP or indel at this position. The analysis was carried out using bam-readcount (The McDonnell Genome Institute, 2018) in conjunction with a custom python script which for every position classified the count of reads asL. gmelinii, L. sibirica or “other”.