Phylogenomic analysis
Orthology assignment
To identify paralogous and orthologous genes, we compared 30
whole-genome protein-coding gene sets from teleost fish (Supplementary
Table 3) (27 species from our previous dataset of Natsidis et al., 2019
in addition with two extra Tetraodontidae species i.e.,Takifugu bimaculatus. T.
flavidus and L. sceleratus ) with Orthofinder v2.3.12 (Emms and
Kelly 2015) using default parameters. Firstly, the longest isoform of
each gene was kept using the primary_transcript.py script
provided by Orthofinder suite. For L. sceleratus, only the
longest isoforms over 30 amino acids were extracted with a custom script
(longestIsoforms.py) and used in the analysis.
Species tree reconstruction
For the phylogenomic analysis the orthogroups produced by orthofinder
were filtered, keeping those containing a single gene per species to
avoid inclusion of paralogs. Then, we kept those with representation
from at least 26 out of the 30 taxa analysed in total, using a custom
python script (filtered_orthogroups.py) The genes of each orthogroup
were aligned using MAFFT v7.453 (Katoh and Standley 2013), with the
-auto mode. The aligned orthogroups were then concatenated using a
python script by P. Natsidis
(https://github.com/pnatsi/Sparidae_2019/blob/master/concatenate.py).
The resulted alignments were filtered with Gblocks v0.91b (Castrecana
2000) to exclude poorly aligned regions with the following parameters:
’Allowed Gap Positions’ was set to half, ’Minimum Length of a Block’ was
set to 8, ’Minimum Number of Sequences for a Flanking Position’ was set
to 20, and ’Minimum Number of Sequences for a Conserved Position’ was
set to 18.
Then, we ran RAxML-NG v0.9.0 (Kozlov et al. 2019) for phylogenetic tree
reconstruction, and in order to select the best model, we used
ModelTest-NG v0.1.6 (Darriba et al. 2019) specifying the –topology
type parameter to maximum likelihood (ml) mode. The phylogenomic
inference was run using the selected model, JTT+I+G4+F. To assess the
branch confidence, we ran 100 bootstrap replicates. The final tree was
visualized using R/RStudio (RStudio Team (2021) with a custom script
using Lepisosteus oculatus as outgroup (phylo_tree_plot.r).