Genetic maps

Because a high percentage of markers returned data and are polymorphic for each of the four families, we construct the genetic map for each family. The total genetic distances ranged from 999.2 to 1350.1 cM (Table 1). The average Pearson’s correlation (r) between physical and genetic positions ranged from 0.86 to 0.95 genome-wide, and genetic maps covered 94.3% to 99.1% of the reference genome (supplementary file 1-4).  Each parental genetic map was similar within each chromosome, but there were some regions that failed to recombine in a specific parent. This might due to structural variation in the genome, for example region on chromosome 1 between 4 - 12 Mbp for the female parental map of the MN family (Figure 4).
We analyzed markers that failed to map, to determine whether these represented poor markers in all families.  The vast majority of these problematic markers were either monomorphic (Provided no information of recombination in either parent, i.e. homozygous) or distorted (Table 2); in total these two categories represented 811 to 924 markers in each family. Few markers (3 to 19) were in three other categories: mapped to the wrong linkage group (LG), did not map to any LG, or disproportionately inflated genetic distances. Only 8.4% of the monomorphic markers and 3.4% of the distorted markers were problematic in three or four families; most (65.0% and 67.8%) were problematic in only one family (Figure 3).  We compared  the distance between distorted markers and a random sampling of the same number  of markers from the entire set, and the distance between distorted markers was  significantly smaller than the random expectation (supplemental figure S6, Mann-Whitney test, P<1E-13). Thus, the distorted markers are linked and may reflect differences in genome biology among taxa and might be problematic for a specific genetic background. However, constructed a consensus map using an artificial population using 600 vines (150 vines from each family),  83% of the markers in this panel can be genetically mapped with the average Pearson’s correlation (r) between physical and genetic positions of 0.95 across all chromosomes (Supplementary file 5).  This high genetically mapping rate indicates that these 83% of the markers should return good polymorphic data and segregate in a Mendelian fashion in most Vitis taxa.     

 GWAS for flower sex

A total of 1,712 and 1,784 post-imputed and filtered markers were analyzed for association with flower sex trait measured in 157 and 509 vines from the HC and RS families, respectively. Trait segregation ratios of 71:76 and 86:18 matched expectations of 1:1 for HC and 3:1 for RS, respectively, based on non-significance in a χtest. Thirteen and seventeen markers, respectively, significantly predicted flower sex after Bonferroni multiple comparisons correction. Marker chr2_4825658 (chromosome 2 at position 4,825,658 bp) was the most significant marker in both HC (< 9.6E-17) and RS (P < 2.8E-09) family, explaining 82.0% and 50.4% of the phenotypic variation, respectively (Figure 5).

Discussion

Genus-wide transferable marker design based on core genome 

A set of ‘universal’ genetic markers that work for related taxa is desired in many genetic studies. In marker-assisted breeding, universal markers can be used in distant hybridization and gene introgression\cite{Chagné2004,Brondani2006,Diaz2011} . In molecular ecology and evolutionary studies, universal markers allow comparison of genetic characters among related species\cite{Singh_2012,Bernardes_2018}.  In some taxa enriched in economically important species, universal markers that are transferable across species can decrease the time and effort in developing unique markers for each species\cite{Kuleung2004,Pan_2018}. While the transferability for low-throughput microsatellite (SSR) markers is relatively good, ranging from 27% to 77% in the different taxa of plants and animals\cite{Barbará2007} , the transferability of high-throughput SNP genetic markers have been as low as 2%\cite{Vezzulli2008,Chagné2012}
In this study, we developed and validated a pipeline for designing universal markers that work for the whole Vitis genus, which diverged 20 Mya. Using the rhAmpSeq genotyping platform, 93% of markers returned data for all four families tested, and around 70 % of markers were polymorphic in each family. The genetic maps built for these four population were consistent in marker order and recombination rate. Although 10% to 20% of markers in each family had unexpected Mendelian segregation ratios, these markers were family specific and clustered on chromosomes. However, in the consensus genetic map, the vast majority of the markers were informative in constructing a genetic map, indicating good marker transferability. Further, in two families where the sex loci were analyzed, the most significant marker explaining the most phenotypic variation were the same. In other words, not only were the random markers transferable, but the functional markers were also transferable. Thus, it appears that the markers designed based on a genus-wide core genome are transferable in key aspects, including amplification, polymorphism, segregation, and marker-trait association. 
The design of transferable markers benefited from the construction of the genus-wide core genome considering the collinearity. Previously, markers designed based on shotgun resequencing had limited transferability because only local genetic variation could be accessed, and large and complex structural variation was often missing. Any long collinear block conserved within a taxa is suggestive of strong selection against structural variation within the block, which increases the probability to design markers with consistent occurrence in the genome and consistent segregation pattern. For this study, long range scaffolding with 10X Genomics de novo assembly was an enabling technology to identify collinear blocks at an inexpensive price point (about $3500 per a 450 Mbp genome). And we also obseved a 70% design rate for core transcriptome in another study. By using genus-wide polymorphism to design primers on conserved sequences that flank regions with moderate polymorphism, we obtained markers that returned data reliably with informative data in most cases.  

The advantage of rhAmpSeq genotyping platform for highly diverse and heterozygous species 

Previously, we found that the AmpSeq genotyping platform outperforms GBS for highly diverse and heterozygous species, due to limited missing data, increased coverage and accuracy at heterozygote sites, and elevated transferability among species. Different from SNP arrays or KASP, which typically target two alleles per marker, or site, the AmpSeq genotyping platform allows identification of numerous, novel alleles as a short haploblock because the entire amplified region (typically 200 to 250bp) is sequenced through NGS. In this study, rhAmpseq markers had mean of 5.7 alleles per marker when considering all the parental accessions that were genotyped in this study(supplementary figure 8). This high information content, even coverage, and unbiased sequencing of amplicons make this platform is applicable in population genetics and ecology studies. Relative to AmpSeq, the rhAmpSeq technology simply adds an RNA base and block DNA at the 3’ end of each primer. When the match is perfect between the primers and template, this RNA-base and blocker are cleaved by RNase H2 enzyme 32. This step increases the genotyping specificity and increases the multiplexing capacity up to 5000 markers per reaction.