3.3 | Relationship between CRISPR distribution and MLST
Based on MLST results, 243 different sequence types (STs) were identified in 427 K. variicola strains, but 60 strains were not assigned to a defined ST due to the limited information in PubMLST database. The most prevalent ST was ST20 (18/427, 4.22%), followed by ST60 (14/427, 3.28%) and ST10 (11/427, 2.58%). Further analysis found that the distributions of type I-E and type I-E* systems were strong associated with MLST, but type IV-A system was scattered throughout the whole genetic lineage (Figure 3). For example, once one strain within one ST harbor type I-E or I-E* system, all strains within the same ST were type I-E-positive (e.g., ST20, ST92, ST108, ST137) or I-E*-positive (e.g., ST188). However, this phenomenon was not found in strains containing type IV-A system.
To further clarify the relationship between CRISPR evolution and MLST, a hierarchical clustering analysis was performed based on the presence of spacers. Likewise, there was a strong association between the spacer contents of type I system and MLST (Figure 4A and 4B). For example, all ST20 strains harbored relatively conserved type I-E spacer contents. Likewise, type I-E* spacers also showed obvious aggregation (e.g., ST188). Differently, type IV-A spacer contents were random across MLST. As shown in Figure 4C, type IV-A spacers compositions of ST271, ST115, and ST148 were highly similar though they were phylogenetically unrelated.