2.3 | Repeat structure prediction
The characteristic of repeat sequences was identified by Weblogo (https://weblogo.berkeley.edu/logo.cgi ) web server. The RNA secondary structures of the repeat sequences were predicted by the RNA fold Web server. Meanwhile, the minimum free energy (MFE) was calculated.
2.4 |MLST and phylogenetic analyses
In silico analysis of MLST was performed by MLST 2.0 available on the CGE website using the seven housekeeping genes (i.e., leuS, pgi, pgk, phoE, pyrG, rpoB, and fusA ) as queries (https://cge.cbs.dtu.dk/services/MLST/) [32]. Phylogenetic tree was constructed by Mega v7.0 using neighbor-joining method. Multiple sequence alignment was completed by MUSCLE v3.8.31 [33]. The visualization of the phylogenetic tree was implemented using iTOL v6 (https://itol.embl.de ).
2.5 | Spacer analysis, protospacer target identification andprotospacer adjacent motif ( PAM) determination
The putative origin of CRISPR spacers was acquired by the CRISPRtarget web server (http://crispr.otago.ac.nz/CRISPRTarget/crispr analysis.html ). The 8bp nucleotide sequences from upstream of the predicted protospacers were extract to predict PAM using Weblogo (https://weblogo.berkeley.edu/logo.cgi ) web server. The hierarchical clustering analysis of spacers was performed by the “seaborn” module in python script. The network of K. variicolaspacers and MGEs from other species were visualized in Gephi with the layout generated by a combination of Fruchterman Reingold and Noverlap algorithms (https://github.com/gephi/gephi ). Each pair of species was connected by at least one spacer-protospacer match.