Discrimination powers of conventional, rice-specific, and super DNA barcodes
The different genome types within the Oryza genus have generally diverged sufficiently for most molecular markers to discriminate between them. The resolution of the various markers is tested by the presence of more than one species per genome type. Phylogenetic methods are the most reliable way to assign a sample to a species and the following comparisons were based on the maximum parsimonious phylogenies of nearly identical samples using different molecular markers, such asmatK , rbcL , psbA-trnH , ITS, NP78+R22, rice-specific barcodes, and the super barcode. Because of narrowly or incorrectly delimited species, molecular markers cannot discriminate between the following species pairs: O. alta and O. grandiglumis (Bao & Ge, 2004), O. barthii and O. glaberrima (Wang et al., 2014), O. glumipatula and O. longistaminata , O. granulata and O. meyeriana (Gong, Borromeo, & Lu, 2000),O. minuta and O. malampuzhaensis , O. nivara andO. sativa subsp. indica , and O. sativa subsp.japonica and O. rufipogon .
The matK gene had an aligned length of 1417 sites with 90 parsimony-informative characters when outgroups were included. This marker failed to discriminate between species of the A ,B , and C genomes (Fig. S1).
The rbcL gene had an aligned length of 1428 sites with 50 parsimony-informative characters when outgroups were considered. This marker also failed to discriminate between species of the A ,B , and C genomes (Fig. S2).
The psbA-trnH region had an aligned length of 515 sites with 10 parsimony-informative characters when outgroups and partial rps19were included. This marker could successfully identify only O. brachyantha and O. sativa subsp. indica (Fig. S3).
The nuclear ITS (including 5.8S) had an aligned length of 713 sites with 162 parsimony-informative characters when outgroups were considered. The samples used for this marker differed slightly from those subjected to chloroplast markers because the sequences were difficult to amplify. Only one ITS copy was detected in several allotetraploid species. Phylogeny data based on ITS suggested that the H or Jgenome types originated from the F genome type (Fig. S4), a finding not supported by the other two nuclear genes. The ITS failed to discriminate between species of the A and C genome types.
The nuclear NP78+R22 gene combination had an aligned length of 2218 sites with 722 parsimony-informative characters when outgroups were included. This marker combination failed to discriminate between species of the A , B , C , H , and Jgenome types (Fig. S5).
The rice-specific barcode consisted of six hypervariable chloroplast regions and had an aligned length of 7943 sites with 603 parsimony-informative characters when outgroups were considered. This marker combination resolved almost all species except O. punctataand O. minuta of the B genome type (Fig. S6).
Finally, the super DNA barcode of the complete chloroplast genome had an aligned length of 145,860 sites with 5048 parsimony-informative characters when outgroups were included. The super barcode exhibited the highest discriminating power, resolving all species using an insensitive but extremely reliable phylogenetic method (Fig. 1). Even though species of genome types A and C are very closely related and difficult to identify, the super barcode resolved them sufficiently well. Surprisingly, the species O. rufipogon + O. sativasubsp. japonica and O. nivara + O. sativa subsp.indica were separable using the super barcode.