RESULTS
Quality of sequencing data
and assembly. A total of 12 samples were used for transcriptome
sequencing in this study, half out of which were from N.
succinctus and half out of which were from N. variciferus. For
each of N. succinctus and N. variciferus , 3 specimens
(triple repeat) were toxic as treat group and 3 specimens (triple
repeat) were non-toxic as control group. A total of 39.96 Gb bases and
40.19 Gb bases were generated for all specimens of N. succinctusand all specimens of N. variciferus respectively based on
Illumina Hiseq sequencing. The clean reads quality metrics after
filtering sequences containing low-quality, adaptor-polluted and high
content of unknown base (N) reads were shown in Table 1. The
distribution of base content and quality were shown in Figure S1. These
reads quality indicated that the transcriptome sequencing performed well
for biological functional analysis. Accession numbers of 12 samples in
Genbank were: SRR10582953, SRR10582959, SRR10582955, SRR10582954,
SRR10582956, SRR10582961, SRR10582957, SRR10582958, SRR10582960,
SRR10582963, SRR10582963, SRR10582964.
Gene functional annotation. For both N. succinctus andN. variciferus , the Nr and Nt databases got higher annotation
proportion than other databases, with more than 30% (Table S1). The
overall annotation proportion for N. succinctus and N.
variciferus was 52.61% and 55.07% respectively. The distribution of
annotated species with NR was shown in Fig. 2. The annotated species
were generally consistent between N. succinctus and N.
variciferus , including Aplysia californica , Oncorhynchus
mykiss , Octopus bimaculoides , Monosiga brevicollis MX1 ,Lottia gigantea , Biomphalaria glabrata , Crassostrea
gigas , Trichuris suis , Exaiptasia pallida , Lingula
anatine and Mus musculus . Among these species annotated,A. californica showed the highest annotation proportion. The
coding sequence (CDS) was selected from segment of unigenes that best
mapped to functional databases. For some unigenes which were not
annotated, the ESTScan was used to predict the CDS30.
For both N. succinctus and N. variciferus , the total
number of CDS was between 10000-80000 and the mean length was 300-500bp.
There was no much difference for the CDS prediction between the two
species.
Gene expression patterns and different expressed genes. After
assembly and mapping clean reads to unigenes, the gene expression level
for each sample was calculated by PCA analysis. As shown in Fig. 3, the
expression level between the toxic and non-toxic specimens were
generally different. The non-toxic N. succinctus specimens
from Dalian showed more similar gene expression pattern. Two toxic
specimens of N. variciferus from Lianyungang had different
expression from the non-toxic samples from Dalian.
The different expressed genes (DEGs) revealed up-regulated and
down-regulated genes between toxic and non-toxic specimens for both
species (Fig. 4). Compared with the non-toxic specimens (ZD) in N.
variciferus , each toxic specimens (ZL) showed more up-regulated genes.
For N. succinctus , the toxic specimens (HL) also produced more
up-regulated genes in comparison with the non-toxic specimens (HQ). The
Gene Ontology (GO) classification and functional enrichment was
performed for DEGs, including three ontologies of molecular function,
cellular component and biological process. N. variciferus andN. succinctus produced coincident patterns of functional
enrichment, for which the cellular and metabolic process from biological
process, and binding and catalytic activity from molecular function
accounted for the top classification categories (Figure S2).
The most common different expressed genes were obtained from toxic
(treatment) and non-toxic (control) groups for both species (Table 2).
Compared with the non-toxic groups, the most upregulated genes from
toxic groups included heat shock protein 83-like, cytochrome c oxidase
subunit I and II, protein transport protein Sec24D isoform X2, WAS
protein family member 2, delta-aminolevulinic acid dehydratase, and
others. The pathways related with the main upregulated genes were
protein processing in endoplasmic reticulum, Oxidative phosphorylation,
Metabolic pathways, Cardiac muscle contraction, Thermogenesis,
Non-alcoholic fatty liver disease (NAFLD), Alzheimer disease, Parkinson
disease, Huntington disease, Porphyrin and chlorophyll metabolism
Metabolic pathways, and so on. On the other hand, the pathways related
with the downregulated genes were mainly about transcription, like
spliceosome, RNA transport, basal transcription factors.
For accuracy confirmation of transcriptome sequencing, multiple DEGs
found as potential functional genes were selected for qPCR. The qPCR of
these selected genes was performed for both toxic specimens (treat
group) and non-toxic specimens (control group). The 18S was selected as
the reference gene. In general, the statistic results indicated that the
2 -(∆∆Ct) and actual expression level for treat and control groups
showed consistent tendency as in the Fig. 5.
Sodium channel genes. Based on the transcript unigenes of both
species, we tried to retrieve the sodium channel genes for both species.
A total of 3425bp sequence (unigene CL8899.Contig1_All) from N.
succinctus was best matched to the sodium channels genes in reference
database with identify score with more than 80%. The amino acids of
this matched unigene was identified as Domain II and Domain III of
sodium channels. Among the amino acid sites, we found one new amino acid
‘L’ in Domain II, in comparison with all other species (Fig. 6). By PCR
confirmation, we obtained the same sequence of this new found sodium
channels gene from N. succinctus . Unfortunately, no perfect
matched sodium channels genes were found from unigenes of N.
variciferus even though we blasted the clean reads of both species
(150bp) to all reference sequences.