Lineages characterisation and meta-phylogeographical patterns
Three per cent and a 15% similarity clusters were used, whereby 3% clusters are considered a proxy to species, and from here on referred to as ”OTUs”; while 15% clusters are lineages of one or more species and are hereon referred to as ”15% lineages”. We evaluated the genetic diversity, distribution, and degree of habitat specificity for each OTU and 15% lineage. We then tested the relative roles of the habitat and the geographical distance in the diversification of soil fauna within the island. The number of haplotypes was recorded as a measure of the genetic richness of each OTU, and OTUs were classified as ”single haplotype” or ”multiple haplotypes”. At the level of 15% lineages and under the assumption that each arises from a single colonisation of Tenerife, the number of OTUs within each 15% lineage was used to classify each lineage as ”non-diversified” or ”diversified” according to whether they included one or multiple OTUs within the island. BLAST search (blastn -outfmt 5 -evalue 0.001 ) against a reference library including all sequences on BOLD (database downloaded at 3-07-2020), together with COI sequences from southern Iberia (Arribaset al., 2020), and COI Collembola sequences from Cicconardi et al. (2017) from outside the Canary Islands, were used to classify OTUs as ’non-endemic’ if similarity with non-Canarian sequences was ≥97%; and ’likely introduced’ if the similarity was ≥99%.
To explore OTU and 15% lineage distributions, the number of sampling sites with a presence (number of occurrences), the maximum geographical distance of occurrences, and the different habitats with occurrences were recorded for each OTU and 15% lineage, the latter summarised using Venn diagrams. Habitat specificity was estimated for each entity using the proportion of occurrences in a particular habitat, considering those with 80-100% of occurrences in one habitat as entities with high habitat specificity. Estimations of habitat specificity were performed for those entities sampled in n or more sites, with n = 3, 4, 5, and 6. Finally, we explore the structure of genetic diversity for each OTU and 15% lineage with a product of its number of sites by its number of haplotypes ≥ 15. Firstly, we tested the relationship between the genetic distance (F84 model) and geographic distance (Euclidean distance between sampling sites). The relationship between both distances was estimated by randomising spatial distances 1000 times and computing the proportion of times in which the model deviance was smaller than the randomised model deviance, adjusting a linear model using the glm function (link = ”identity”) as in Gómez-Rodríguez & Baselga (2018). Geographic distances were calculated using the R package gdistance as before, with calculations performed for each pair of sites with the lowest and highest limit of permitted movements restricted to the highest (plus 100 meters) and lowest (minus 400 meters) values of the two sites. We applied these restrictions to avoid shortest paths transgressing unfavourable habitats over the top of the island, while also allowing paths to cross the valley separating the central region of Tenerife from the Anaga peninsula, and facilitating connectivity over cliffs separating coastal sites. In addition we also tested the correlation between genetic distance (F84 model) among haplotypes and their distribution in the four habitats, using permutational ANOVAs with 999 permutations and the habitat as a grouping factor. To graphically summarise patterns of haplotype relatedness and habitat association, we estimated and plotted haplotype networks for all 15% lineages including four or more haplotypes using the functionmjn of the R package pegas (Paradis, 2010). For 15% lineages with more than 40 haplotypes (four cases), the mjnfunction could not be applied, and networks were alternatively estimated with the haploNet function, which uses an infinite site model and uncorrected distances.
Results