Sequencing output and de novo transcriptome assembly
Illumina sequencing generated a total number of 405,026,380 single-end raw reads across the 16 FASTQ files accounting for 80.4 GB of data (Additional File 1: Table S3). A total of 175,910,591 reads remained following trimming of adapters and poor quality reads and removal of rRNA (Additional File 1: Table S3). Average reads per sample after trimming and rRNA removal was 10,994,412 reads. Thus, rRNA was a large proportion of total raw reads, as was observed in a previous RNA sequencing study of epidermal mucus (Greer et al., 2019). The remaining 175 million total reads were used for de novo transcriptome assembly. Trinity assembled 268,935 genes with an N50 contig length of 810 bp and an average contig length of 589 bp based on the longest isoform per gene (Additional File 1: Table S4). Of the 268,935 Trinity genes, 89,209 (33%) were annotated by BLASTX (Additional File 1: Table S4). Analysis of transcriptome completeness with BUSCO indicated 67% complete, 28.6% fragmented, and 4.4% missing single copy orthologs, indicating a fair reconstruction of the transcriptome. The incidence of fragmented single copy orthologs is likely due to the fact that RNA in epidermal mucus was partially degraded (RIN < 7) which may also explain the relatively low percentage of transcripts receiving a BLASTX annotation.