Sequencing output and de novo transcriptome assembly
Illumina sequencing generated a total number of 405,026,380 single-end
raw reads across the 16 FASTQ files accounting for 80.4 GB of data
(Additional File 1: Table S3). A total of 175,910,591 reads remained
following trimming of adapters and poor quality reads and removal of
rRNA (Additional File 1: Table S3). Average reads per sample after
trimming and rRNA removal was 10,994,412 reads. Thus, rRNA was a large
proportion of total raw reads, as was observed in a previous RNA
sequencing study of epidermal mucus (Greer et al., 2019). The remaining
175 million total reads were used for de novo transcriptome
assembly. Trinity assembled 268,935 genes with an N50 contig length of
810 bp and an average contig length of 589 bp based on the longest
isoform per gene (Additional File 1: Table S4). Of the 268,935 Trinity
genes, 89,209 (33%) were annotated by BLASTX (Additional File 1: Table
S4). Analysis of transcriptome completeness with BUSCO indicated 67%
complete, 28.6% fragmented, and 4.4% missing single copy orthologs,
indicating a fair reconstruction of the transcriptome. The incidence of
fragmented single copy orthologs is likely due to the fact that RNA in
epidermal mucus was partially degraded (RIN < 7) which may
also explain the relatively low percentage of transcripts receiving a
BLASTX annotation.