Gene Expression Analysis
To examine differential gene expression and differential transcript usage among life stage (egg, larvae, and adult), as well as female and male adults, we mapped paired-end short-read RNAseq data to the super-transcriptome using Hisat2 to generate input count matrix for genes and exons within genes. As each RNAseq sample is unreplicated, we chose statistical tools that have performed reliably in differential expression and splicing analyses (Mehmood et al. 2019; Rapaportet al. 2013) and applied stringent significance thresholds based on multiple testing criteria. First, we used edgeR (Robinson et al. 2010) to conduct the differential expression analysis and assumed a recommended dispersion value for within species comparisons (0.16, based on human datasets). The count matrix was filtered by requiring a minimum count of 1 read-per-million per gene in at least three samples. Hierarchical clustering was used to measure the overall sample to sample distance based on differential expression, with read data transformed to log2 counts-per-million (logCPM), or moderated log-counts-per-million, to avoid the effect of undefined values and to shrink poorly expressed genes with low counts to zero. We then identified significantly differentially expressed genes in pairwise contrasts of each life-stage or sex, using Fisher’s Exact Test based on the negative-binomial distribution. Statistical significance in each contrast was determined based on a Benjamini-Hochberg adjustedp-value , or false discovery rate (FDR), of 0.01. Gene lists for the top 25 genes (lowest p-value ) from each pairwise contrast were combined to generate a heatmap using the pheatmap package in R. For the egg vs. larvae, larvae vs. adult female, and adult female vs. male contrasts, we examined the biological function for all significantly differentially expressed genes (FDR < 0.01) using a gene set enrichment analysis (GSEA) based on gene ontology (GO) terms. The goseq package (Young et al. 2010) in R was used to identify GO terms that were enriched by comparing the subset of differentially expressed genes to all genes that were expressed in either the female or male samples. The Wallenius approximation was then used to calculate statistical significance based on an FDR of 0.05, while taking gene length into account using an empirically estimated probability weighting function.
We conducted another GSEA analysis based on genes exhibiting differential exon usage in the female vs. male contrast. A generalized log-linear model was fit to the exon count data, and log-fold differences among exons in each multi-exon gene were assessed using thediffSplice function in the limma package in R (Ritchie et al. 2015). Differences in expression among exons were first assessed using t-tests, and then the Simes method was used to provide a multiple testing correction for a gene-wise measure of significance. We used the Simes p-value with a significance threshold of 0.01 to generate a gene list for GSEA analysis, and the background list included all multi-exon genes.