Phylogeny and Gene Family Size Analysis
For the gene family expansion/contraction analysis, we first reconstructed a phylogenomic tree for N. riversi and the 11 beetle species with genomes in RefSeq (O’Leary et al.2016), as well as Drosophila melanogaster (Insecta: Diptera, hereafter Dmel) as an outgroup for the Coleoptera. The latest genomes were used to identify single copy orthologs using BUSCO v2.0 with the Endopterygota reference dataset. More than two thousand single copy orthologs were discovered from the 13 genomes. We then used MAFFT v7.471 (Katoh & Standley 2013) to align the amino acid (aa) sequences of 13 species for each ortholog, and found 1,017 orthologs had sequences in all 13 species. In order to reduce any effects of misalignment, we used trimAl v1.2 (Capella-Gutiérrez et al. 2009) to remove ambiguous bases and gaps. This trimming process reduced the total alignment size from 1,021,369 (including gaps) to 382,948 aas. We then used the R package kdetrees (Weyenberg et al.2014) to detect any gene tree outliers based on both topological similarity and branch length similarity to all gene trees. After removing 14 outliers, we concatenated the gene-specific alignments into a single alignment (1,013 orthologous genes, with 379,463 aas in total) and constructed a Maximum Likelihood (ML) tree using RAxML v8.2.12 (Stamatakis 2014). In RAxML, we applied the PROTGAMMAAUTO substitution model and designated D. melanogasteras the outgroup, and the branch support was evaluated with 100 bootstrap replicates. The final ultrametric tree was generated by calibrating the ML tree with fossil records using treePL (Smith & O’Meara 2012), with parameters obtained from the prime function. For fossil records, we used the estimated age of Tshekardocoleidae (oldest known beetle, 295 Mya) to define the minimum age of the most recent common ancestor of all Coleopteran species (Kirejtshuk 2020). The minimum estimation of crown Coleoptera (208.5 Mya) was used to define the minimum age of Polyphaga (Wolfe et al. 2016). The oldest Chrysomelidae fossil (Mesolpinus , 122 Mya) was used to define the minimum age of Chrysomeloidea (Hu et al. 2020; Kirejtshuket al. 2015), whereas the fossil weevil (Nemonychidae, 157 Mya) was used to define the maximum age of Chrysomeloidea (Behrensmeyer & Turner 2013).
With the resulting ultrametric tree, we set out to analyze changes in the size of gene families in N. riversi relative to other beetles, while accounting for phylogenetic history, using CAFE v3.0 (Ganote et al. 2018). We first identified all one to one orthologous genes among the twelve beetle species and Dmel (as outgroup) using OMA standalone v2.4.1 (Altenhoff et al. 2019). The input dataset included the precomputed all-against-all genomes of Tcas and Dpon from the OMA database, and the proteome sequences of the remaining ten beetle species and Dmel from RefSeq (O’Learyet al. 2016). This resulted in a count matrix of the number of genes in each hierarchical orthologous group (HOG). Among the 18,438 HOGs, we dropped one HOG (HOG06033) that had more than 200 gene copies to allow CAFE to find the optimized λ (rate of evolutionary change). We identified all rapidly expanding and contracting HOGs using a p-value threshold < 0.05. In addition, we relaxed this threshold to consider all expanding HOGs on the branch of N. riversi , and conducted a GSEA test using clusterProfiler (Yu et al. 2012) in R. We examined two levels of cutoff (0.05 and 0.1) for Benjamini-Hochberg adjusted p-values to identify enriched GO terms (Yuet al. 2012). The gene ontology terms associated with biological processes were further clustered based on semantic similarity using the Revigo webserver (Supek et al. 2011), using thesimRel score to collapse redundant terms at 0.7 similarity and the UniProt database to determine the gene ontology term abundance, and visualized using a CirGO plot (Kuznetsovaet al. 2019).