2.5 Transcriptomic analysis
CAext and AE proteins were searched for within a
transcriptome dataset obtained from O. alismoides acclimated to
LC and HC (Huang et al., 2018). Information of the different
CO2 treatments is shown in Supplementary Data Table S1.
Six samples (three HC and three LC acclimated mature leaves) were used
for second-generation sequencing (SGS) for short but high-accuracy reads
(Hackl et al., 2014). Six other samples were used for the
third-generation sequencing (TGS) for longer sequences but lower-quality
reads (Roberts et al., 2013).
Around 0.3 g fresh weight leaves were collected 30 minutes before the
end of the photoperiod, flash frozen in liquid N2 and
stored at -80°C before use. Total RNA was extracted using a commercial
kit RNAiso (Takara Biotechnology, Dalian, China). The purified RNA was
dissolved in RNase-free water, with genomic DNA contamination removed
using TURBO DNase I (Promega, Beijing, China). RNA quality was checked
with the Agilent 2100 Bioanalyzer (Agilent Technologies, Palo Alto,
California). Only the total RNA samples with RNA integrity numbers ≥8
were used to construct the cDNA libraries in PacBio or Illumina Hiseq
sequencing.
For TGS analysis, total RNA (2 μg) was reversely transcribed into cDNA
using the SMARTer PCR cDNA Synthesis Kit that has been optimized for
preparing high-quality, full-length cDNAs (Takara Biotechnology, Dalian,
China), followed by size fractionation using the BluePippin™ Size
Selection System (Sage Science, Beverly, MA). Each SMRT bell library was
constructed using 1-2 μg size-selected cDNA with the Pacific Biosciences
DNA Template Prep Kit 2.0. SMRT sequencing was then performed on the
Pacific Bioscience sequel platform using the manufacturer’s protocol.
For SGS analysis, cDNA libraries were constructed using a NEBNext®
Ultra™ RNA Library Prep Kit for Illumina® (NEB, Beverly, MA, USA),
following the manufacturer’s protocol. Qualified libraries were
sequenced, and 150 bp paired-end reads were generated (Illumina Hiseq
2500, San Diego, CA, USA).
The TGS subreads were filtered using the standard protocols in the SMRT
analysis software suite (http://www.pacificbiosciences.com) and reads of
insert (ROIs) were generated. Full-length non-chimeric reads (FLNC) and
non-full-length cDNA reads (NFL) were recognized through the
identification of poly(A) signal and 5’ and 3’ adaptors. The FLNC reads
were clustered and polished by the Quiver program with the assistance of
NFL reads, producing high-quality isoforms (HQ) and low-quality isoforms
(LQ). The raw Illumina reads were filtered to remove ambiguous reads
with ‘N’ bases, adaptor sequences and low-quality reads. Filtered
Illumina data were then used to polish the LQ reads using the proovread
213.841 software. The redundant isoforms were then removed to generate a
high-quality transcript dataset for O. alismoides , using the
program CD-HIT.
TransDecoder v2.0.1 (https://transdecoder.github.io/) was used to define
the putative coding sequence (CDS) of these transcripts. The predicted
CDS were then functional annotated and confirmed by BLAST, which was
conducted against the following databases: NR, NT, KOG, COG, KEGG,
Swissprot and GO. For each transcript in each database searched, the
functional information of the best matched sequence was assigned to the
query transcript. The phylogenetic tree of αCA-1 isoforms based on
deduced CA peptide sequences from the NCBI, was analyzed with Geneious
software (Windows version 11.0, Biomatters Ltd, New Zealand). The
location of the protein was analyzed using Target P1 (Emanuelsson et
al., 2007; http://www.cbs.dtu.dk/services/TargetP/).