We removed samples with close relationship based on the eigenvalues from the PCA and kept 20 V. vinifera samples and 20 non-vinifera samples.
Marker design pipeline
The vcf file generated from the genus-wide variants calling is loaded in to R. For each non-gap alignable region in the core-genome, we checked their length, diversity and missing rate. The regions that are shorter than 200bp, with diversity larger than 7% or smaller than 2%, or has average missing rate large than 50% were dropped out. These steps were conducted in R using the bioconductor (version 3.8 ) package VariantAnnotation \cite{Obenchain2014}. The candidate regions are then picked to ensure one marker per 200Kb. If no qualified candidate region can be found in a 1 Mbp window, we included the regions that has highest coverage in the core genome construction. For each 1 Mbp sliding window, we randomly include more candidate region for the high gene density region. A total of 2500 candidate region were sent to IDT for primer design and pooling compatibility test. Primers can be designed for 99.6% of the regions and 98.1% of them are pooling compatible in one-tube-PCR. A total of 2000 rhPCR markers were synthesized by IDT.