Background
Population structuring in the absence of obvious physical barriers have puzzled biologists for centuries. In oceanic environment, strong genetic structure is expected. Most marine animals are capable of exchanging migrants across large distances, and their genetic structure results from a combination of a long larval pelagic phase, high fecundity, large population sizes and adult migratory behavior (Faria et al., 2013). Yet, many studies have shown that several species have higher spatial genetic differentiation than expected considering their high dispersal potential (Palero et al., 2008; Pérez-Ruzafa et al., 2006). In the case of marine fish, structure can range from a lack of differentiation between oceans to significant structure within an ocean basin, challenging the simple concept of “open seas” and the assumption of high connectivity in marine environments (Graves, 1998). Assessing the existence of population structure in marine species capable of long-distance dispersal is essential to identify the various factors involved in population differentiation and diversification in the absence of complete physical barriers (Faria et al., 2021). This is especially relevant for conservation efforts, including stock management of commercially important species (Faria et al., 2013).
The Mediterranean Sea and the contiguous Northeastern Atlantic Ocean have been the focus of several phylogeographic and population genetic studies on marine fish marine fish (e.g. Patarnello et al., 2007; Tine et al., 2014). The Almeria-Oran Front, a well-defined oceanographic break situated east of the Strait of Gibraltar, has been suggested to be responsible as responsible for hindering gene flow between Mediterranean and Atlantic fish populations of many fish species but it is far from being an universal barrier (Patarnello et al., 2007). The less studied Macaronesia, a group of archipelagos (Azores, Madeira and Canaries) separated from the Euro-African mainland by c. 100–1,900 km, has also been the target of several phylogeographic studies (e.g. Kasapidis et al. 2011 or Sá-Pinto et al.,2008). This area is characterized by the presence of several oceanographic currents, e.g., the North Atlantic Current, the Azores Current and the Canary Current (Sala et al., 2013), that together with the apparent lack of physical barriers can strengthen the potential for gene flow. Therefore, it is not surprising that several studies have reported low population genetic differentiation within the Macaronesian region for different taxa (Faria et al., 2013), including fishes (Francisco et al., 2011; Stefanni et al., 2015). Species distributed across these regions can thus inform us about the existence of cryptic substructure and possible barriers to gene flow between populations.
One of the most important pelagic fish resources in Atlantic waters is the European sardine, Sardina pilchardus Walbaum, 1792. This species has an enormous economic value, especially in Southern Europe and Morocco, where it is the main target of the purse-seine fleets in Portugal and Spain, representing a major source of income for local economies (ICES, 2013). Recently reported low biomass levels (ICES, 2020) led to recommendation to reduced fishing in Southern Europe, with great economic impact. It also prompted us to reevaluate the current population structure of S. pilchardus aiming at the ongoing discussion on how genetic information can contribute to stock delineation for management purposes (Caballero-Huertas et al., 2022).
The European sardine has a broad distribution from the Eastern Mediterranean to the North-East Atlantic, including the Azores, Madeira and the Canary Archipelagos, and is found along the African coast down to Senegal (Parrish et al., 1989). As other marine pelagic fish, S. pilchardus shows schooling and migratory behavior and high dispersal capabilities, both at the larval and adult stages. In agreement, low levels of genetic differentiation were detected across the species distribution using allozymes (Chlaida et al., 2009; Chlaida et al., 2006; Laurent et al., 2007; Spanakis et al., 1989), mitochondrial DNA (mtDNA) (Atarhouch et al., 2006; Tinti et al., 2002), and microsatellites (Gonzalez & Zardoya, 2007; Kasapidis et al., 2011). Nevertheless, phenotypic variation in gill raker counts and head length (Andreu, 1969; Parrish et al., 1989) and mitochondrial haplotype frequency differences (Atarhouch et al., 2006) led to the proposal of two subspecies: S. pilchardus pilchardus (North Sea to southern Portugal), and S. pilchardus sardina (Mediterranean Sea and northwest African coast). Accordingly, otolith shapes differ between Atlantic and Mediterranean sardines (Jeema et al., 2015), and further suggest a subdivision between the Northern Mediterranean and the Alboran-Algero-Provençal basin (Jeema et al., 2015; Alemany & Alvarez, 1993). A study using 15 allozymes supports the latter (Ramon & Castro, 1997), but, unlike the otolith shapes, these markers suggest discontinuity caused by the Almeria-Oran front. When considering a large fraction of the European sardine Atlantic range, allozymes and microsatellites suggest that Madeira and Azores form a significantly differentiated group (Kasapidis et al., 2011). This mosaic of regional population structure built by several independent studies has been mostly justified by geographical barriers that potentially hinder gene flow, expected to be high for the abundant and mobile S. pilchardus. The phenotypic differences between groups might also have arisen from retention of adaptive phenotypes, and population structure in the Mediterranean was found to be associated with environmental variables (Antoniou et al., 2022). This prompted us to raise questions about the contributions of genomic architecture to the basis for the observed present-day population structure.
In this study, we produced an European sardine genomic data set consisting of whole genome nuclear data and complete mitochondrial genomes for 88 individuals that were analyzed together with data from 20 sardine individuals from a previous study (Barry et al., 2022), in a total of 108 samples from a total of 16 locations across 5000 km of the species distribution range. This enabled investigating previously suggested barriers to gene flow, mapping the major genetic clusters that characterize S. pilchardus in a large part of its distribution, the comparison between markers with different modes of inheritance but also to get a first insight into the genomics barriers contributing to the observed population structure.
Materials and Methods
Sample collection and DNA extraction
Samples were collected from 17 different geographical locations encompassing a large part of the species’ current distribution range (Figure 1A, Table 1). A total of 15 samples from three were collected during oceanographic surveys and the remaining 73 specimens, from ten distinct geographic locations, were sampled at local markets (Table S1). Sequence data for the samples from Bay of Biscay, Gulf of Cadiz, Mar Menor and the Gulf of Lion (n=20) were obtained from Barry et al. (2022), adding further sampling locations to our dataset.
Total genomic DNA was extracted using Qiagen's DNeasy Blood & Tissue Kit (Hilden, Germany) according to the manufacturer's instructions, with the following modifications, prior to elution in 100ul AE buffer, samples were incubated at 37 ºC for 10minutes, to increase DNA yield. DNA concentration and purity were verified using a Nanodrop Spectrophotometer and a Qubit Fluorometer. A commercial service (Novogene, China) produced Truseq Nano DNA libraries and sequenced paired-end reads (150 base pairs (bp)) in a Novaseq6000. To assess the patterns of genetic differentiation of the European sardine, 81 samples were sequenced to < 3 X sequencing depth (i.e. each position of the genome is covered by 3 reads) and seven to 20 X sequencing depth (details in Additional file 1: Table S1). Raw data for 20 sardine individuals from (Barry et al., 2022) was further processed using the same procedure as described in the next section (sequencing depth between 15 and 22 X). Table S1 indicates the assignment of samples to the different subsets considered for further analysis.