3.2 | Analysis of CRISPR repeat-spacer arrays
The CRISPR array was composed of discontinuous DR sequences and intervening spacer sequences. As presented in Figure 2A, the size and base arrangement of the repeat sequence were relatively conserved within CRISPR types. The length of the repeat sequences was 29 nucleotides for type I-E and IV-A systems, 28 nucleotides for I-E* system, respectively (Figure 2D). In addition, the prediction results of RNA secondary structure showed that all types of repeats formed stable ′stem-loops′ structures in the middle (Figure 2C). According to the predictions, the RNA secondary structure included 9, 7, and 6 bp stem lengths for types I-E, I-E*, and IV-A, respectively (Figure 2C). Based on the structure diagram and MFE value, the secondary structures of repeat sequences can be analyzed for conservation and stability. A small MFE value indicated high structural stability, the length of stem was proportional to structural stability. As shown in Figure 2C, the secondary structure of type I-E repeat sequence had the least MFE values and the longest stem colored in red, suggesting that the secondary structure of the type I-E repeat was the most stable.
Spacer sequences are captured into CRISPR array with the aid of Cas proteins. Spacer number reflects the activity of CRISPR/Cas system. As seen in Figure 2E, the number of the spacers was diverse. Among 105 strains carrying CRISPR/Cas system, strain WUSM_KV_47 had the largest number of spacers (41 type I-E spacers). The smallest spacer number was identified in strain EuSCAPE_TR218 (3 type IV-A spacers). Moreover, type I-E system (26.5, 17-37) had more spacers than type I-E* (13, 8.5-15; p < 0.001) and type IV-A (16, 13-20; p < 0.001). However, there was no significant difference in spacer numbers between type I-E* and type IV-A systems (13, 8.5-15 vs 16, 13-20; p  = 0.128).
PAM plays an important role in the acquisition of spacer sequences. As shown in Figure 2B, PAM sequences for type I-E system, I-E* system was inferred to be 5′-AAG-3′ and 5′-(C) GAA-3′, respectively. Considering that PAM was essential elements for Cas protein to recognize and degrading foreign DNA, diverse PAM represented different Cas protein variants. The difference of PAM in type I-E and I-E* system further supported the evolutionary and functional divergence. Notably, PAM predicted for type IV-A system (5′-AAG-3′) was identical to that predicted for type I-E.