Figure 1. The evolution of Gossypium genus and the formation of tetraploid Gossypium species. MB show genome size of representative species.
3. Domestication of Upland Cotton
After polyploidization, G. hirsutum evolved to produce high-quality fiber and to best survive against adverse environments [7]. The domestication history of G. hirsutum is similar to that of the other three domesticated cotton species; indigenous peoples may have gathered and employed lint fibers for string and other textile items. [7,20,21]. Domestication of cotton gave rise to long lint fiber, which has a flat convoluted ribbon shape that allows it to be spun into yarn [20,22]. Short ‘linters’ or fuzz stick to the seed coat tightly, whilst longer ‘lint’ fibers cling to the seed coat loosely. Fuzz fibers are an important source of raw material for paper and other industrial products. Cotton is the world’s most important fiber crop due to its longer, spinnable lint fibers, and these novel single-celled seed epidermal trichomes may have lured ancient peoples to the cotton plant in the first place. Four separate Gossypium species were domesticated independently by four civilizations on two continents, as previously stated: A-genome diploidsG. herbaceum and G. arboreum were domesticated in Africa and Asia, and allopolyploids G. hirsutum and G. barbadensewere domesticated in Central and South America [20,22,23]. Wild cotton feature short and coarse lint fibers that are very different from those seen in current cultivar cottonseed. Although the elongated seed trichomes may operate as a dispersion mechanism in some ecological circumstances and/or function in maintaining an adequate microbiological and hydration background for seed germination or early seedling development, the biological function of lint fiber has not been determined [24]. Early domesticators were likely drawn to primitive lint fiber, with current germplasm’s long lint fiber, resulting from the human selection of genotypes with enhanced fiber quality attributes, as well as higher lint output and other agronomic qualities [23,25]. Interestingly, the elongation of fiber cells during development is directly linked to the evolution of long spinnable lint fiber; particularly, developing fiber cells in domesticated diploid and tetraploid cotton all exhibit a protracted fiber-cell growth programmed. Only the F-genome/A-genome lineages developed this developmental novelty, which may have aided the domestication of A-genome cotton [26,27]. When the A-genome joined the D-genome during polyploidization, this predisposition for extended lint fiber growth was passed on to allopolyploids [28]. G. hirsutum is said to have been domesticated in Mesoamerica’s Yucatan Peninsula. According to Brubaker and [29], race ‘punctatum’ is the earliest domesticated form of G. hirsutum , having agronomic characteristics that are transitional between the really wild race ‘yucatanense’ and races with more advanced attributes such as latifolium and palmeri. Early attempts at domestication may have involved choosing a more appealing plant from the wild population to generate door-yard cultigens, which later evolved into large-scale field production as civilization grew and agriculture became more specialized. Increased lint output through choosing larger and more bolls per plant, lowering plant size from the shrubby/small treelike habit of wild cotton to a scale that people can utilize, and selecting for a more annual life cycle are all qualities that early agriculturalists may have preferred. Later decisions concentrated on finer, stronger, and consistently longer fibers to improve the fiber’s quality. Other fiber qualities such as elongation and short fiber content have grown increasingly important as the textile industry has become more automated. As agriculture and agribusiness converge, the industrialization of production will continue to define aspects that increase production efficiency. For example, growing mechanization needed plant size selection, but monoculture’s vulnerability to pests and diseases necessitated enhanced selection for resistance to these challenges. Ware [30] has published a thorough historical account of the history of G. hirsutum as Upland cotton cultivars from the time of European arrival to the middle of the twentieth century. The birth and development of Upland germplasm, the current cultivated form of G. hirsutum , occurred in the southern United States of America, despite the fact that the specific location ofG. hirsutum domestication is unclear. The ‘Cotton Belt’ of the United States was the epicenter of upland cotton’s genetic development. When a result, the cotton crop was brought to the eastern coastal areas of North America as Europeans departed. While all four domesticated Gossypium species were planted as crops in the United States from the start, the allotetraploids outperformed the A genome diploids. As the twentieth century progressed, a new categorization system based on geographic areas and industrial techniques emerged, which is still in use today: Acala type, Plains type, Delta type, and Eastern type [31]. Asiatic diploid speciesG. arboreum and G. herbaceum , with much shorter staple lengths than Upland or Sea Island cotton, were produced in cotton-producing nations such as India, China, and Russia prior to the establishment of the US cotton industry [32]. The new spinning methods available at the time could not handle the fiber produced by G. arboreum and G. herbaceum . While there has been some success in breeding for longer fiber length within the diploid species [33], more Upland germplasm was introduced to meet the need for new varieties with better fiber quality. By the 1920s, practically all worldwide breeding efforts had shifted to the allopolyploid G. hirsutum , a short-fiber Asiatic species [30].
4. Cotton Improvement
Since Niles and Feaster’s report, the great majority of cotton breeding efforts have continued to use the same technique [9]. Cotton is mostly a self-pollinated plant; hence most cotton growers use a modified pedigree breeding approach to generate pure line cultivars. In general, parents with different attributes or traits of interest are chosen for cross hybridization, segregating populations are evaluated in the field to identify individual plants with the desired trait combinations, seed from the selected plants is evaluated in progeny rows, and inbred lines that outperform “check” cultivars are evaluated in replicated tests over multiple locations and years [34,35]. The procedures of cotton breeding programs have been nicely laid out in literature [36–38]. Current breeding operations make use of substantially more machinery, allowing them to handle a bigger number of progeny rows with fewer people and more sophisticated fertilization, plant growth control, and pest management. Although the relative significance of these features may have altered, the traits that are wanted in present cultivars are not significantly different from those in earlier years of cotton cultivation [37]. In every commercial cotton breeding program, lint output remains the top objective [39–41]. Lint percent, which is a component of lint output, was probably one of the first qualities to be chosen throughout the domestication process and in early breeding, and is still the most desired character [23,42–46]. Environmental stability [47,48] and early maturity [49–51] are two more agronomic features that have risen in relevance [37]. Despite the fact that the advent of transgenic Bt cultivars has shifted the attention away from insect resistance, host-plant resistance remains significant for a variety of diseases and nematodes. Similarly, breeding efforts used to priorities cotton genotypes with low vegetative growth and erect stature, but plant height and compact growth habits may now be readily regulated with plant growth regulators such as Mepiquat [52]. The second most significant aim in early breeding attempts was fiber quality, which is still the case in commercial cotton breeding projects [53–55]. The replacement of manual spinning methods with machine spinning and weaving technologies, which required adequate fiber length and strength to work properly, provided the motivation for this endeavor. Cotton fibers’ quality is determined by their physical characteristics. Lint fiber is frequently spun into yarn, which is subsequently woven or knitted into a variety of textiles dependent on the quality and desired end-product characteristics. The collection of fiber characteristics that determine the efficiency of yarn spinning, weaving, and other fabric-making activities, as well as the quality of cotton textiles, is referred to as fiber quality. The key fiber factors that are substantially linked with spinning performance and end product quality are the length, strength, elongation, and fineness/maturity (measured in micronaire) of the fiber. The relevance of fiber characteristics and how they are quantified was nicely outlined by Chee et al. (2009) [56].
Lint production and fiber quality are both quantitatively inherited characteristics. For yield and fiber quality variables, Campbell et al. [57] reported the mean broad-sense and narrow-sense heritability. Although yield components and fiber quality attributes are both heritable and exhibit additive genetic variation and frequently exhibit a negative correlation, it is suggested that this is due to linkage rather than pleiotropy; however, Campbell et al. [58] found that the negative connection in Pee Dee germplasm maintains after over 80 years of breeding. As a result, improving yield and fiber quality simultaneously is the most difficult problem in cotton breeding.
Meredith and Bridge (1973) tested the performance of four cottons (G. hirsutum L. ) cultivars under four environmental conditions and estimated seven yield components with seven fiber properties using nine different harvests. Results revealed that the lint index was lowest for early harvest and was highest for the middle harvests, while cultivars were the most important source of variations for fiber properties, indicating the importance of genetic variation for fiber quality traits improvements [59]. Later they used a modified recurrent selection method for improving lint percentage within a cotton (G. hirsutum L) cultivar ‘Deltapine 523,’ through initially plant-based selection, followed by progeny-row basis selection and construction of S1, S2, and S3 selfed generations. They finally attained eight progenies in S3 with 2.5% higher span length [60], thereby significantly improving the cotton for lint yield, lint percentage, fiber length, lint index, and Micronaire in the S3 generation population compared to the S0 population [60]. Using G. barbadense as a donor parent, reciprocal backcross population of the S6 generation of G. hirsutum × G. barbadense crosses showed significant genotypic variations, and improvements in fiber-quality-related traits, including micronaire, fiber elongation, fiber strength, and upper half mean length showed significant higher genotypic variance in G. hirsutumbackground than G. barbadense, indicating cytoplasmic effects on the genetic variations and heredity [61]. The growing number of evidences has been reported to verifying the adoptive roles of alien introgression for different quality traits, especially fiber quality ofG. hirsutum[62–68].
5. Development of Spinnable Fiber and Polyploidization
In terms of fiber quality improvement, it is worth noting that spinnable fiber appeared just once in Gossypium’s history, in an ancestor of the two domesticated diploid A-genome species, after the F-genome lineage split. This feature was handed down to allopolyploid cotton when the A-genome fused with a D-genome from an ancestor that didn’t make lint fibers in a shared nucleus [27,69]. As previously stated, the formation of long spinnable lint fiber is closely linked to a protracted elongation phase during fiber cell development.
Applequist, Cronn and Wendel [26] used accessions from the AD-genome allopolyploids G. hirsutum and G. tomentosum , as well as diploid species G. herbaceum (A-genome), G. arboreum(A-genome), G. raimondii (D-genome), G. davidsonii(D-genome), G. anomalum (B-genome), and G. stur(F-genome). Accessions from the AD-genome allopolyploids and the A-genome diploids have a much higher rate of fiber elongation than the other diploids, according to a comparison of growth curves across species. Hovav et al. [27] came to a similar result after comparing gene expression profiles over a developmental time-course of fiber fromG. herbaceum and G. longicalyx . In domesticated A-genomeG. herbaceum , their findings revealed significant changes in the expression of genes associated with stress responses and cell elongation, as well as a longer developmental profile. Thus, the evolution of lint fiber included a continuation of an ancient developmental programmed that developed prior to polyploidization in the ancestral A-genome. Polyploidization has another major meaning for the evolution of spinnable fiber, in addition to supporting the evolution of a protracted period of fiber elongation. The joining of the A- and D-genomes in a single nucleus may have allowed D-genome alleles (genes) to be recruited into fiber development, resulting in higher fiber quality and production in polyploid cotton. Because only A-genome diploid species generate spinnable fiber, the relevance of the D-genome in polyploid cotton fiber quality genetic determination has long been debated. Jiang et al. [70] presented the first evidence of the extent to which loci on the Dt-subgenome cause genetic variation in fiber quality attributes, showing that the majority of QTLs for fiber quality mapping to the Dt-subgenome. Numerous genetic mapping analyses, summarized by [56], have now supported the observation that the Dt-subgenome, from the ancestor that did not have spinnable fiber, plays a large role in the genetic control of polyploid cotton fiber growth and development. These findings show that Dt-subgenome genes have been recruited to the genetic regulation of fiber quality attributes, leading to polyploid cotton’ transgressive fiber quality and yield compared to diploid progenitors. Many advantageous alleles at important loci for fiber qualities may have already been fixed as a result of natural selection, according to [70], because the At-subgenome has a significantly longer history of selection for fiber formation. On the other hand, fiber growth loci on the Dt-subgenome may not have been subjected to strong selection until after polyploidization, and hence, mutations that improved this feature may have only been advantageous after polyploidization. As a result, the Dt-subgenome may have had greater ‘room for improvement’ of fiber qualities when the artificial selection was recently enforced by domestication and breeding. Recruitment of Dt-subgenome loci may provide polyploid cotton more flexibility for artificial selection through breeding, explaining why polyploid cotton has better fiber characteristics than farmed A-genome diploids. The uneven pace of evolution in the polyploid AD-subgenome is a secondary consequence of polyploidization for spinnable fiber and a host of other characteristics that geneticists are only beginning to understand through genome sequence analysis. The majority, if not all, loci are duplicated in allopolyploid genomes by definition. Lynch and Conery [71] addressed the different outcomes of duplicated genes, claiming that most duplicated genes go through a brief period of relaxed selection before being silenced or pseudogenized. However, a tiny percentage of duplicated genes survive in duplicate and contribute to developing phenotypic complexity through natural selection. A null hypothesis is that homoeologous genes would develop independently and at comparable rates following polyploid formation in recent allopolyploids such as cotton, where the two subgenomes contain duplicated but somewhat divergent copies of most genes. Allopolyploid cotton have gone through bi-directional concerted evolution for some genes, resulting in differing directional biases in different sections of the genome. In situ hybridization revealed that distributed repetitive sequences that are A-genome specific at the diploid level had expanded to the Dt-subgenome in allopolyploids, which provided the first clues to this occurrence [72,73]. Furthermore, multiple studies have found that the Dt-subgenome has much greater allelic diversity of homoeologous genes [74] and loci affecting quantitative features than the At-subgenome [18,75]. Non-reciprocal DNA conversion favors genes in the Dt-subgenome over genes in the At-subgenome, according to recent genome sequence comparisons [76]: for example, the sequences of around 40% of the At and Dt genes in an elite cotton cultivar change from their diploid ancestors. Most of these mutations are convergent, with at least one gene being changed to the Dt state at a rate more than double that of the reciprocal [9]. These findings show that polyploidization allowed D-genome genes to take on new tasks in the allopolyploid genome, potentially explaining why domesticated allopolyploid cotton outperforms its diploid offspring in terms of agronomic and fiber quality.
6. Gene Introgression and Inter-Specific Hybridization
Each phase in the domestication of cotton, from wild G. hirsutumto feral cultigens to the advent of Upland germplasm to current better cultivars, imposed severe genetic bottlenecks, limiting allelic diversity. Morphological [77], and molecular characterization have been used to record the amounts and patterns of genetic erosion involved with the formation of early cultigens, landraces, and current cultivars [78,79]. Too many genomic region and genes have reported to be introgressed from wild relative of upland cotton contributed to the improvements of upland cotton [8,65,80].
The average genetic distance across 378 Upland accessions from the United States examined with 120 SSR markers was only 0.195, demonstrating that Upland cotton germplasm is quite limited [79]. Lubbers and Chee [81] used 250 RFLPs to examine 320 Upland cultivars/germplasm from the United States National Plant Germplasm Collection and found cotton to have less genetic diversity than most important crops. When top germplasm from various geographical origins was assessed, the amount of genetic diversity did not improve. Indeed, the average number of alleles discovered per locus in a survey of 157 elite cultivars from China, the United States, Africa, the Former Soviet Union, and Australia utilizing 146 SSR loci was just 2.3 [82]. Surprisingly, these genetic limitations were followed by sustained improvements in several key cotton properties, particularly fiber quality. Given the low allelic diversity in the Upland cotton gene pool, it is reasonable to assume that the number of favorable alleles for fiber quality (such as fiber length and strength) that have yet to reach fixation is small, as these traits have been under intense selection pressure since the early stages of domestication. As a result, it is not unexpected that interspecific introgression has long been a topic of discussion in the Upland cotton community [83,84]. Upland cotton breeding has prioritized high production and adaptability, whereas domesticated strains of G. barbadense , often known as Pima, Egyptian, or Sea Island cotton, have prioritized improved fiber quality. As a result, farmed G. barbadense fiber is substantially longer, finer, and stronger than Upland cotton, which is more extensively grown. However, both Pima and Egyptian cotton have a restricted range of environmental adaptability in irrigated regions in dry zones of the Western United States and Lower Egypt, respectively, of the G. barbadense that are still in production. Nonetheless, this species’ distinctive fiber qualities make it an attractive option for supplying additional genetic variety to increase Upland cotton fiber quality. It is no surprise, therefore, that studies of populations produced from interspecific hybridization between wild and domesticated strains of G. barbadense and Upland cotton have investigated the genetic basis and heritability of species fiber qualities [56,85]. Interspecies genome merge provides an opportunity to the introduce a foreign beneficial gene for crop improvements and genetic analysis. Saha et al. (2006) developed monosomic and monotelodisomic substitution hybrids between G. hirsutum and G. tomentosum and identified several types of numerical and structural variations and offered a valuable germplasm for localization of genomic markers and development of backcross substitution line for cotton cultivars’ improvements [86]. Recently, Muthuraj et al. (2019) developed male sterile triploid interspecific hybrids between tetraploid G. hirsutum and diploid G. armourianum, which showed intermediate phenotypes, and this germplasm is an important genetic source for introducing sucking cotton pest “jassid”-resistant genes into the cultivated cotton cultivars through conventional breeding schemes [87]. In order to barrier free wild gene introgression into cultivated cotton, a tri-species hybrid “(G. arboreum × G. anomalum ) ×G. hirsutum ” was produced. The cytomorphological analysis of a tri-species hybrid and its backcross progenies to G. hirsutumshowed the production of monovalent to hexavalent offspring and allosyndetic chromosomes pairing, indicating the possibility of intergenomic genetic exchanges and yet a homoeologous relationship among these species [88]. As expected, the molecular marker data combined with cytogenetic findings determined the multi-genome background in monovalent to hexaploid progenies and provided an important intermediate material for introducing exotic genetic introgression [88]. Draye et al. (2005) used backcross-self-pollination population of a G. hirsutum andG. barbadense cross and identified 32 and 9 QTLs for fiber fineness and micronaire, respectively, and from nine micronaire QTLs, seven were also associated with fiber fineness; however, the majority of the members of the BC3F2 population showed inferior phenotypes, thus imposing hurdles to utilize G. barbadense in conventional planting breeding programs [89]. The breeding utilization of G. barbadense -introgressed line inG.hirsutum showed high mid-parent heterosis for yield, and F1 to F3 hybrids outcompeted the high-yielding commercial cultivar [90], indicating the suitability of introgressed lines for being outcompeting cultivars. Remarkable progress was made by Hulse-kemp et al. (2015) by developing CottonSNP63K, an Illumina Infinium array with 45,104 intraspecific and 17,954 interspecific putative SNP markers, and generating two high-density genetic maps, collectively providing new cotton resources for cotton breeders [91]. Later, Hinze et al. (2017) used this CottonSNP63K array and validated that it could distinctly separate G. hirsutum from other Gossypiumspecies, distinguish the wild from cultivated types of G. hirsutum , and identify loci possibly linked to cotton seed protein contents [92]. As chromosome segment substitution (CSSLs) lines provide an ideal opportunity to map QTLs in interspecific hybrids, a CSSL derived by hybridizing and backcrossing G. hirsutum and G. barbadense genotyped by whole genome re-sequencing identified 64 QTLs for 14 agronomic traits, and many alleles of G. barbadense showed extremely high values for improving cotton seed pool contents [93]. Recombinant inbred line populations produced between the Chinese G. barbadense cultivar 5917 and the American Pima S-7 were tested for lint yield and fiber quality traits, followed by GBS sequencing, and there were 42 QTLs identified, including 24 QTLs on 12 linkage groups for fiber quality and 18 QTLs on 7 linkage groups for lint yield, thereby proving an initial material for fine mapping of QTLs, prediction of candidate genes, and development of molecular markers.
The majority of these genetic investigations have shown that fiber qualities are heritable [94–96], implying that interspecific introgression might increase specific Upland cotton fiber traits. In addition to fiber quality, wild and domesticated allotetraploid Gossypium is a significant source of disease and pest resistance genes that might be transferred into Upland cotton [65,97–99]. Pathogens, nematodes, and insects cause severe crop losses anywhere cotton is grown, and crop protection expenditures account for a large percentage of the high unit cost of cotton production, which is why transgenic pest-management cultivars are so appealing. According to Meredith Jr [100], breeding for disease resistance is more important than breeding for pest resistance in most breeding programs. This is especially true with the development of Bt cotton varieties, which are insect-resistant. Some wild species resistance characteristics are simply inherited, and breeders have taken advantage of these features since they are easy to select. However, many resistance qualities are quantitatively inherited, and using DNA markers to manipulate them has lately become considerably more successful. Disease resistance genes found in wild and domesticated allopolyploid Gossypium are summarized in Table 1. After polyploid development during the Mid-Pleistocene, around 1–2 Mya, G. hirsutum , G. barbadense , and the wild allopolyploid species separated from a common ancestor. When introduced into a diverse genetic background, mutations that have accumulated in various allopolyploid lineages may or may not interact positively (Orr 1995). While allopolyploid species are sexually compatible, later-generation hybrids sometimes exhibit partial reproductive obstacles such as lower fertility, segregation distortion (non-Mendelian inheritance), and hybrid breakdown [101]. Jiang et al. [102] employed DNA markers to analyze the transmission genetics of an advanced back cross generation interspecific hybrid population between G. hirsutum and G. barbadense , demonstrating the effects of these barriers to gene introgression across allotetraploid cotton. Individual allele transmission patterns often promote the eradication of the donor genotype, maintaining the recurrent genotype’s integrity. Segregation distortions were common, for example, and early generations of hybridization resulted in the full eradication of certain donor alleles as early as the BC3 generation due to under-representation of donor alleles. Interestingly, the segregating ratios at identical loci introgressed into different independently derived BC3 F2 families were highly variable, with some families favoring recurrent parent alleles while others favor donor parent alleles, implying that hybrid incompatibility is best explained by multi-locus epistatic interactions affecting gamete success and genotype fecundity. These findings illustrate the challenge of using interspecific populations to generate superior Upland cotton cultivars by pyramiding numerous beneficial alleles for quantitative features, including lint production, fiber length, fiber strength, and fiber fineness in a single genotype. Most attempts to directly mix Upland cotton with Sea Island (G. barbadense ) varieties, as observed by Brown and Ware [103], revealed that while the F1 generation is beautiful, the F2 and F3 generations are regarded as “messy” and are nearly usually abandoned (Figure 2). While pedigree research revealed that several cultivars were created through interspecific hybridization, a molecular study using isozymes and DNA markers revealed that the Upland cotton gene pool is rather uniform. Rare alleles are found in just a few closely related cultivars within a germplasm group and are thought to have evolved by introgression [6,104]. Furthermore, none of these G. barbadense -introgressed alleles were discovered in current cotton cultivars, suggesting that the advantages of G. barbadense introgression in Upland cotton are still completely unmet. Conversely, the introduction of Upland cotton genes into Pima cotton, a cultivated variety of G. barbadense , has substantially aided the production and adaptability of current Pima cultivars [105,106].
Figure 2. Floral abnormality in later generation hybrids between Gossypium hirsutum and G. barbadense : (A ) abnormal style, stigma, and anther formation; (B ,C ) abnormal bud development; and (D ) an abnormal flower of an F2 generation of Gossypium hirsutum × G. barbadense .
7. Introgressive Breeding
Interspecific populations derived from crossings between G. hirsutum and G. barbadense solved the classic problem of limited genetic diversity in Upland cotton genetic and QTL mapping in the early days. These two grown species are sought for somewhat different traits in addition to providing the DNA-level polymorphism required for genetic map creation. As previously stated, G. hirsutum breeding has stressed maximal output and broad adaptability, whereas G. barbadense breeding has emphasized fiber quality. As a result, most genetic mapping [15,107] and molecular quantitative genetic studies of fiber properties have used populations derived from interspecific hybridization involving wild and domesticated forms of G. barbadense crossed with Upland cotton. QTLs for numerous fiber quality variables mapped in cotton were summarized by Chee and Campbell [57]. Some important QTLs for fiber length, strength, and fineness have now been found, and several have been verified [108,109]. This knowledge gives cotton growers greater options to improve certain fiber qualities in upland cotton by introducing genes from G. barbadense with minimal disruption to the favorable allelic combinations developed over a century of selective selection. Inbred backcross populations generated from crossing Upland varieties with the allotetraploid species G. barbadense , G. tomentosum , andG. mustelinum have been developed as part of a collaborative effort to minimize Upland cotton’s genetic susceptibility [110]. Inbred backcrossing was used to reduce reproductive obstacles caused by interspecific introgression between these species [111,112]. One can examine very tiny pieces of introgressed DNA for agronomic or fiber quality performance and analyze them for QTLs by establishing a comprehensive set of Near Isogenic Introgression lines from the BC2 or BC3 families. Because recombination and segregation have split the donor genome into smaller components, the activity of specific genetic loci may be more clearly defined than in previous generations. Table 2 shows the number of QTLs detected in each of the inbred backcross groups. Rong, Feltus, Waghmare, Pierce, Chee, Draye, Saranga, Wright, Wilkins and May [18] reported the alignment of 432 fiber QTLs identified in 10 interspecific G. hirsutum by G. barbadense populations into a consensus map, providing further information on the genetic dissection of each of the fiber properties.
Table 2. Summary of QTL mapped in interspecific Gossypium species.