2.8.3 Correlating gene expression profiles with CO2 treatment and OA-affected behaviours
To analyse the correlation between gene expression and behavioural traits of squid across CO2 treatments, we employed weighted gene co-expression network analysis (WGCNA) followed by canonical correlation analysis (CCA) on the CNS and eyes, separately (Figure 1). Refer to Supplementary Text S3 for a detailed description of the methods for gene co-expression network construction and module detection, module eigengene correlation with behavioural traits, module membership vs gene significance, and identification of hub genes (Figures S3 – S21, Tables S1 and S2). Briefly, the gene-level counts from DESeq2 (v1.30.1) (Love et al., 2014) were used in the WGCNA package (v1.70-3) (Langfelder & Horvath, 2008) to construct a co-expression network and detect modules of genes. A module eigengene was calculated for each module of genes, representing the gene expression profiles of that module. CCA using package CCA (v1.2.1) (González et al., 2008) was used to explore the correlations between the two sets of variables from the same individual squid: ME set = module eigengenes from each module; traits set = COlevel (current-day or elevated) and behavioural traits (active time (s), distance (cm), speed (cm/s), time in Zone A (s), whether the squid displayed an exploratory/aggressive interaction (yes/no), number of exploratory/aggressive interactions). For those modules identified by CCA to be correlated with each trait, the Pearson correlation of module membership (MM, higher value indicates the gene is more highly connected to the given module) and gene significance (GS, higher value indicates a more biologically relevant gene) was calculated (Langfelder & Horvath, 2008). A correlation of GS and MM imply that genes more highly connected with a given module also tend to be more highly correlated with the given trait, providing another measure for the importance of this module with the given trait (Langfelder & Horvath, 2008). This identified the final modules of interest, within which the MM and GS values of each gene were used as a screening method to identify biologically relevant, highly interconnected hub genes (Fuller et al., 2007; Horvath & Dong, 2008; Langfelder & Horvath, 2008), i.e. to find genes correlated with CO2 treatment and each behavioural trait. Hub genes were defined as those genes within the final modules of interest with a very strong correlation with the module (MM > 0.8) and a moderate correlation with the given trait (GS > 0.4).
All hub genes for CO2 treatment were compared across tissues to identify hub genes for CO2 treatment that are CNS-specific, eyes-specific or found in both tissues. Hub genes for CO2 treatment that were also a hub gene for one or more behavioural traits were identified as genes correlated with the associated OA-induced behavioural change. Finally, functional enrichment analysis was used for the CNS-specific CO2 treatment hub genes that were also hub genes for all three activity traits in the CNS using over-representation analysis (ORA) in clusterProfiler (v3.18.1) (Yu et al., 2012) with the hypergeometric test. This determined if any GO terms were significantly over-represented within the list of hub genes. All other groups of CO2 treatment hub genes had 25 or fewer genes and thus ORA was not used. P-values were adjusted for multiple comparisons using the Benjamini-Hochberg method and a significance threshold of padj < 0.05 was used.