RESULTS
Quality of sequencing data and assembly. A total of 12 samples were used for transcriptome sequencing in this study, half out of which were from N. succinctus and half out of which were from N. variciferus. For each of N. succinctus and N. variciferus , 3 specimens (triple repeat) were toxic as treat group and 3 specimens (triple repeat) were non-toxic as control group. A total of 39.96 Gb bases and 40.19 Gb bases were generated for all specimens of N. succinctusand all specimens of N. variciferus respectively based on Illumina Hiseq sequencing. The clean reads quality metrics after filtering sequences containing low-quality, adaptor-polluted and high content of unknown base (N) reads were shown in Table 1. The distribution of base content and quality were shown in Figure S1. These reads quality indicated that the transcriptome sequencing performed well for biological functional analysis. Accession numbers of 12 samples in Genbank were: SRR10582953, SRR10582959, SRR10582955, SRR10582954, SRR10582956, SRR10582961, SRR10582957, SRR10582958, SRR10582960, SRR10582963, SRR10582963, SRR10582964.
Gene functional annotation. For both N. succinctus andN. variciferus , the Nr and Nt databases got higher annotation proportion than other databases, with more than 30% (Table S1). The overall annotation proportion for N. succinctus and N. variciferus was 52.61% and 55.07% respectively. The distribution of annotated species with NR was shown in Fig. 2. The annotated species were generally consistent between N. succinctus and N. variciferus , including Aplysia californica , Oncorhynchus mykiss , Octopus bimaculoides , Monosiga brevicollis MX1 ,Lottia gigantea , Biomphalaria glabrata , Crassostrea gigas , Trichuris suis , Exaiptasia pallida , Lingula anatine and Mus musculus . Among these species annotated,A. californica showed the highest annotation proportion. The coding sequence (CDS) was selected from segment of unigenes that best mapped to functional databases. For some unigenes which were not annotated, the ESTScan was used to predict the CDS30. For both N. succinctus and N. variciferus , the total number of CDS was between 10000-80000 and the mean length was 300-500bp. There was no much difference for the CDS prediction between the two species.
Gene expression patterns and different expressed genes. After assembly and mapping clean reads to unigenes, the gene expression level for each sample was calculated by PCA analysis. As shown in Fig. 3, the expression level between the toxic and non-toxic specimens were generally different. The non-toxic N. succinctus specimens from Dalian showed more similar gene expression pattern. Two toxic specimens of N. variciferus from Lianyungang had different expression from the non-toxic samples from Dalian.
The different expressed genes (DEGs) revealed up-regulated and down-regulated genes between toxic and non-toxic specimens for both species (Fig. 4). Compared with the non-toxic specimens (ZD) in N. variciferus , each toxic specimens (ZL) showed more up-regulated genes. For N. succinctus , the toxic specimens (HL) also produced more up-regulated genes in comparison with the non-toxic specimens (HQ). The Gene Ontology (GO) classification and functional enrichment was performed for DEGs, including three ontologies of molecular function, cellular component and biological process. N. variciferus andN. succinctus produced coincident patterns of functional enrichment, for which the cellular and metabolic process from biological process, and binding and catalytic activity from molecular function accounted for the top classification categories (Figure S2).
The most common different expressed genes were obtained from toxic (treatment) and non-toxic (control) groups for both species (Table 2). Compared with the non-toxic groups, the most upregulated genes from toxic groups included heat shock protein 83-like, cytochrome c oxidase subunit I and II, protein transport protein Sec24D isoform X2, WAS protein family member 2, delta-aminolevulinic acid dehydratase, and others. The pathways related with the main upregulated genes were protein processing in endoplasmic reticulum, Oxidative phosphorylation, Metabolic pathways, Cardiac muscle contraction, Thermogenesis, Non-alcoholic fatty liver disease (NAFLD), Alzheimer disease, Parkinson disease, Huntington disease, Porphyrin and chlorophyll metabolism Metabolic pathways, and so on. On the other hand, the pathways related with the downregulated genes were mainly about transcription, like spliceosome, RNA transport, basal transcription factors.
For accuracy confirmation of transcriptome sequencing, multiple DEGs found as potential functional genes were selected for qPCR. The qPCR of these selected genes was performed for both toxic specimens (treat group) and non-toxic specimens (control group). The 18S was selected as the reference gene. In general, the statistic results indicated that the 2 -(∆∆Ct) and actual expression level for treat and control groups showed consistent tendency as in the Fig. 5.
Sodium channel genes. Based on the transcript unigenes of both species, we tried to retrieve the sodium channel genes for both species. A total of 3425bp sequence (unigene CL8899.Contig1_All) from N. succinctus was best matched to the sodium channels genes in reference database with identify score with more than 80%. The amino acids of this matched unigene was identified as Domain II and Domain III of sodium channels. Among the amino acid sites, we found one new amino acid ‘L’ in Domain II, in comparison with all other species (Fig. 6). By PCR confirmation, we obtained the same sequence of this new found sodium channels gene from N. succinctus . Unfortunately, no perfect matched sodium channels genes were found from unigenes of N. variciferus even though we blasted the clean reads of both species (150bp) to all reference sequences.