For example, for the most widely used SNP genotyping array in maize, the Illumina maize SNP50 BeadChip, only 17% to 33% of the marker are polymorphic in the population made from European maize inbred lines among the 49,585 high-quality markers designed mostly for the temperate germplasm (Bauer 2013). And this problem is more evident in the highly diverse and heterozygous species. The transferability drops to 2.3% among non-vinifera Vitis species for the SNP markers that were developed from the heterozygous grape cultivar PinotNoir \cite{Vezzulli2008}. The same issue has been reported in cattle \cite{Michelizzi2010} and apple \cite{Chagné2012} using the SNP array. In our previous study, we found that the transferability using Ampseq platform outperform the SNP array and GBS \cite{Yang2016} platform. The aim of this study includes: (1) developing a pipeline to design transferable markers using the core genome of the Vitis genus. (2) testing the transferability of these markers in a high-fidelity rhAmpseq platform.
De novo genome assemblies of seven accessions in the Vitis genu
To construct a representative pan-genome for the genus Vitis and assist the breeding practice in grape, we selected three accessions from wild species, three accession of hybrid grape (a crossing of wild and domesticated species) and three accessions of widely cultivated modern cultivars of Vitis vinifera subsp. vinifera (Figure 1). The genome of Sultanina and CabSauv was download from the public database. We de novo assembled seven genomes using 10x Chromium technology with long molecular DNA sequenced on the illumina XTen at 150 bp paired end reads mode. A total of XX to XX X raw sequences were collected and assembled with Supernova Assembler 2.0(). The contig N50 ranges from XX to XX and the scaffold N50 ranges from 278kbp to 2.1Mbp (Table 1). Two locally phased pseudohaplotypes were generated by Supernova 2.0. The average distance of the adjacent SNPs between two haploblocks ranges from XX to XX bp. Since the sequence similarity between these two haploblocks is high, only the pseudohap1 assembly was used to represent the accession in the downstream analysis. The molecular length, effective depth, and assembly statistics for each genome are illustrated in Supplementary Table 1.
Construction of Anchors based on pan-genome To construct the pan-genome of Vitis genus, nine genomes were anchored onto the 12X.2 versions of the PN40024 reference genome to identify the core genome and the dispensable genome. The syntenic regions between the assembly and the PN40024 reference are show in Figure 1 A. The collinearity between the assembly and the reference at genome level are high, which indicates that the overall quality of the assembly is good. The assembly has even coverage in both the arm and pericentromeric regions, but the scaffolds are smaller and less continuous in the pericentromeric regions than those in the arms, which indicates the pericentromeric regions are still challenging to assemble when using the 10X de novo genomes technology. As expected, the wild species represent a larger genetic divergence to the reference than that of cultivated modern cultivars. For example, on chromosome 9, on average 66% of the chromosome is collinear with wild species, while the percentage on average increases to 88% with cultivars. (Figure 1C, supplement table 1]. We defined the coverage of a chromosome region as how many times it occurs in a collinear alignment between the genome assemblies and the reference PN40024. About 10 % of the reference genome were covered by all of the genome assemblies, and these regions contain 64% of the annotated genes in total. This 64% is defined as core genes. This percentage is similar in rice, which is 62%, according to a recent study of 67 rice genomes.