We removed samples with close relationship based on the eigenvalues from the PCA and kept 20 V. vinifera samples and 20 non-vinifera samples.  

Marker design pipeline

The vcf file generated from the genus-wide variants calling is loaded in to R.  For each non-gap alignable region in the core-genome, we checked their length, diversity and missing rate. The regions that are shorter than 200bp, with diversity larger than 7% or smaller than 2%, or  has average missing rate large than 50% were dropped out. These steps were conducted in R using the bioconductor (version 3.8 ) package VariantAnnotation \cite{Obenchain2014}. The candidate regions are then picked  to ensure one marker per 200Kb. If no qualified candidate region can be found in a 1 Mbp window, we included the regions that has highest coverage in the core genome construction. For each 1 Mbp sliding window, we randomly include more candidate region for the high gene density region.  A total of 2500 candidate region were sent to IDT for primer design and pooling compatibility test.  Primers can be designed for 99.6% of the regions and 98.1% of them are pooling compatible in one-tube-PCR. A  total of 2000 rhPCR markers were synthesized by IDT.