Genetic diversity and demographic inferences
Nucleotide diversity (Ɵπ ),Ɵw (Watterson’s theta, based on segregating sites (Watterson, 1975)) and Tajima’s D (TD, (Tajima, 1989)) were computed from the SFS for each species with custom scripts. Significance of TD was evaluated after 1,000 coalescent simulations of a constant population model with scaled size Ɵπ . To test whether sampled demes are isolated or belong to a structured meta-population and to eventually estimate connectivity, we devised three alternative evolutionary models for each species (Figure 1) within an approximate bayesian computation framework (ABC) framework. Model NS (non-structured) defined an isolated population characterized by a modern effective population size (Nmod ) switching instantaneously into an ancestral population size (Nanc ) atTc generations before present. Model FIM specifies a non-equilibrium finite island model defined by 100 demes exchangingNm migrants each generation under a symmetric migration matrix. The array of demes is instantaneously colonized Tcol generations before present from a population with an ancestral size (Nanc ). Model SST is similar to FIM but demes exchanging migrants only with their four neighbours (or less, if they are at the border of the array), in a steppingstone fashion. We performed 50,000 coalescent simulations from prior distributions using fastsimcoal v.2.6.0.3 (Excoffier, Dupanloup, Huerta-Sánchez, Sousa, & Foll, 2013),reproducing the exact number of individuals and loci for each species (Table 1). We first performed model selection through the random forest (RF) classification method implemented in the abcRF R package (Pudlo et al., 2016). We then performed 50,000 additional simulations under the most supported model in order to estimate demographic parameters with the abcRF regression method (Raynal et al., 2019). Both model selection and parameter estimation were computed with the following set of summary statistics: the SFS, Ɵπ ,Ɵw and TD. The first two axes of a Linear Discriminate Analysis performed on the previous statistics were also included for model selection in order to increase the accuracy of the estimates (Pudlo et al., 2016). Even though Ɵπ ,Ɵw and TD are function of the SFS, they convey additional information by the non-linear feature of the functions. Information redundancy among the considered summary statistics is accounted for by the RF algorithm. Model selection and parameter estimation were performed twice to check the consistency of the analyses, and cross validation (or confusion matrix for the model selection) was performed on the first of the two runs. The number of trees in each RF algorithm was chosen by monitoring the evolution of the out-of-bag error (Pudlo et al., 2016).
We investigated the variation in the effective population size (Ne ) through time by running the composite likelihood approach implemented in the stairwayplot v.0.2 software (Liu & Fu, 2015). We set the generation time to seven years for C. melanopterus (Maisano Delser et al., 2016) and to 10 years for the other species (Cortés, 2002; Pirog et al., 2019) for all demographic inferences. We applied a mutation rate per generation per site of 8.4×10-9 to the exon capture data of C. melanopterus (Maisano Delser et al. 2016) and of 1.93×10-8 to the RADseq data for the remaining three species. This mutation rate was determined by scaling genetic diversity between ddRAD (obtained under the same protocol of this study) and Exon Capture data from 12 C. melanopterus individuals from Moorea, French Polynesia (Supplementary Material).