rhAmpSeq sequencing and genotyping

The rhAmp-Seq PCR with 2000 primers was conducted using the IDT rhAmp-Seq high throughput protocol (v2.1). Briefly, the first PCR used 14 cycles with annealing temperature at 61 °C for each sample. The PCR products were diluted 1:20 and indexed with IDT indexing primers using 24 cycles with an annealing temperature at 60 °C. The indexed PCR products were pooled, cleaned with Agencourt AMPure beads, quantified, and sequenced on an Illumina (Illumina, San Diego, CA, USA) MiSeq or NextSeq with paired-end 2 x 150 bp runs.  Pair-ended rhAmpSeq sequencing data were generated for all the four genetic mapping families as described by Yang et al. (2016)\cite{Yang2016}, and analyze_amplicon.pl Perl script (https://github.com/avinashkarn/analyze_amplicon/blob/master/analyze_amplicon.pl) was used to analyze data to obtain haplotype variants (per locus) across all vines in the four families, respectively to generate haplotype to genotype (hapgeno) file. Monomorphic markers and markers with greater than seventy five percent missing data in hapgeno file were manually removed from the further analysis. Finally, using a custom Perl script,  haplotype_to_VCF.pl (https://github.com/avinashkarn/analyze_amplicon/blob/master/haplotype_to_VCF.pl) , top four most repetitive haplotype alleles for each markers in hapgeno file were converted to a VCF file, where, each haplotype allele of a marker was converted to a pseudo “ACGT” allele, which allowed us in the further marker validation analysis that are discussed hereafter. 

Imputation and Filtering

The raw converted VCF files for each grapevine family were imported in TASSEL (Trait Analysis by association, Evolution and Linkage) 5.2.51 software\cite{Bradbury2007} and the genotypes were imputed using the LD-kNNi imputation plugin also known as LinkImpute\cite{Money_2015} using the default parameters (High LD Sites = 30, Number of nearest neighbors = 10, and Max distance between site to find LD = 10,000,000).   Post-imputation, vines with >90% missing data were removed from the analysis.