2.5 Transcriptomic analysis
CAext and AE proteins were searched for within a transcriptome dataset obtained from O. alismoides acclimated to LC and HC (Huang et al., 2018). Information of the different CO2 treatments is shown in Supplementary Data Table S1. Six samples (three HC and three LC acclimated mature leaves) were used for second-generation sequencing (SGS) for short but high-accuracy reads (Hackl et al., 2014). Six other samples were used for the third-generation sequencing (TGS) for longer sequences but lower-quality reads (Roberts et al., 2013).
Around 0.3 g fresh weight leaves were collected 30 minutes before the end of the photoperiod, flash frozen in liquid N2 and stored at -80°C before use. Total RNA was extracted using a commercial kit RNAiso (Takara Biotechnology, Dalian, China). The purified RNA was dissolved in RNase-free water, with genomic DNA contamination removed using TURBO DNase I (Promega, Beijing, China). RNA quality was checked with the Agilent 2100 Bioanalyzer (Agilent Technologies, Palo Alto, California). Only the total RNA samples with RNA integrity numbers ≥8 were used to construct the cDNA libraries in PacBio or Illumina Hiseq sequencing.
For TGS analysis, total RNA (2 μg) was reversely transcribed into cDNA using the SMARTer PCR cDNA Synthesis Kit that has been optimized for preparing high-quality, full-length cDNAs (Takara Biotechnology, Dalian, China), followed by size fractionation using the BluePippin™ Size Selection System (Sage Science, Beverly, MA). Each SMRT bell library was constructed using 1-2 μg size-selected cDNA with the Pacific Biosciences DNA Template Prep Kit 2.0. SMRT sequencing was then performed on the Pacific Bioscience sequel platform using the manufacturer’s protocol.
For SGS analysis, cDNA libraries were constructed using a NEBNext® Ultra™ RNA Library Prep Kit for Illumina® (NEB, Beverly, MA, USA), following the manufacturer’s protocol. Qualified libraries were sequenced, and 150 bp paired-end reads were generated (Illumina Hiseq 2500, San Diego, CA, USA).
The TGS subreads were filtered using the standard protocols in the SMRT analysis software suite (http://www.pacificbiosciences.com) and reads of insert (ROIs) were generated. Full-length non-chimeric reads (FLNC) and non-full-length cDNA reads (NFL) were recognized through the identification of poly(A) signal and 5’ and 3’ adaptors. The FLNC reads were clustered and polished by the Quiver program with the assistance of NFL reads, producing high-quality isoforms (HQ) and low-quality isoforms (LQ). The raw Illumina reads were filtered to remove ambiguous reads with ‘N’ bases, adaptor sequences and low-quality reads. Filtered Illumina data were then used to polish the LQ reads using the proovread 213.841 software. The redundant isoforms were then removed to generate a high-quality transcript dataset for O. alismoides , using the program CD-HIT.
TransDecoder v2.0.1 (https://transdecoder.github.io/) was used to define the putative coding sequence (CDS) of these transcripts. The predicted CDS were then functional annotated and confirmed by BLAST, which was conducted against the following databases: NR, NT, KOG, COG, KEGG, Swissprot and GO. For each transcript in each database searched, the functional information of the best matched sequence was assigned to the query transcript. The phylogenetic tree of αCA-1 isoforms based on deduced CA peptide sequences from the NCBI, was analyzed with Geneious software (Windows version 11.0, Biomatters Ltd, New Zealand). The location of the protein was analyzed using Target P1 (Emanuelsson et al., 2007; http://www.cbs.dtu.dk/services/TargetP/).