Phylogenomic analysis
Orthology assignment
To identify paralogous and orthologous genes, we compared 30 whole-genome protein-coding gene sets from teleost fish (Supplementary Table 3) (27 species from our previous dataset of Natsidis et al., 2019 in addition with two extra Tetraodontidae species i.e.,Takifugu bimaculatus. T. flavidus and L. sceleratus ) with Orthofinder v2.3.12 (Emms and Kelly 2015) using default parameters. Firstly, the longest isoform of each gene was kept using the primary_transcript.py script provided by Orthofinder suite. For L. sceleratus, only the longest isoforms over 30 amino acids were extracted with a custom script (longestIsoforms.py) and used in the analysis.
Species tree reconstruction
For the phylogenomic analysis the orthogroups produced by orthofinder were filtered, keeping those containing a single gene per species to avoid inclusion of paralogs. Then, we kept those with representation from at least 26 out of the 30 taxa analysed in total, using a custom python script (filtered_orthogroups.py) The genes of each orthogroup were aligned using MAFFT v7.453 (Katoh and Standley 2013), with the -auto mode. The aligned orthogroups were then concatenated using a python script by P. Natsidis (https://github.com/pnatsi/Sparidae_2019/blob/master/concatenate.py). The resulted alignments were filtered with Gblocks v0.91b (Castrecana 2000) to exclude poorly aligned regions with the following parameters: ’Allowed Gap Positions’ was set to half, ’Minimum Length of a Block’ was set to 8, ’Minimum Number of Sequences for a Flanking Position’ was set to 20, and ’Minimum Number of Sequences for a Conserved Position’ was set to 18.
Then, we ran RAxML-NG v0.9.0 (Kozlov et al. 2019) for phylogenetic tree reconstruction, and in order to select the best model, we used ModelTest-NG v0.1.6 (Darriba et al. 2019) specifying the –topology type parameter to maximum likelihood (ml) mode. The phylogenomic inference was run using the selected model, JTT+I+G4+F. To assess the branch confidence, we ran 100 bootstrap replicates. The final tree was visualized using R/RStudio (RStudio Team (2021) with a custom script using Lepisosteus oculatus as outgroup (phylo_tree_plot.r).