2.7 Read mapping and counting
As a reference for gene expression quantification, we created and
annotated a de novo transcriptome assembly of I. pygmaeus CNS and eye tissues using long read PacBio ISO-sequencing data. Refer to
Supplementary Text S2 for a description of the methods for
ISO-sequencing, de novo transcriptome assembly, and transcriptome
annotation. The trimmed and decontaminated RNA-seq reads were mapped
against the transcriptome assembly using salmon (v1.3.0) (Patro et
al., 2017). Correction for sequence-specific biases and fragment-level
GC biases was used, the quantification step was skipped, and the flags
‘–validateMappings’ and ‘–hardFilter’ were also used. Corset
(v1.09) (Davidson & Oshlack, 2014) was run on the salmon equivalence
class files from all 40 samples to cluster the transcripts to gene-level
and produce gene-level counts. In Corset, we provided the four
groups/treatments (eyes current-day CO2, eyes elevated
CO2, CNS current-day CO2 and CNS
elevated CO2), the log likelihood ratio test was
switched off to prevent differentially expressed transcripts being split
into different clusters, and the links between contigs were removed if
the link was supported by less than 10 reads.