Percent Sequence Identity of bacterial ribosomal S1 proteins
For all available bacterial ribosomal S1 sequences, we collected and calculated the percent identity for each phyla and group (according to the number of structural domains) and the results of pairwise alignment within the bacterial phyla. The presented data are integrated into the server, which can be accessed at http://oka.protres.ru:4200(Analysis PID). An example of a PID analysis of four-domain containing S1 proteins is given in Figure 2a. An example of a PID analysis for five-domain containing S1 proteins within the Deinococcus-Thermus phylum is shown in Figure 2b. The S1 server also allows the user to obtain amino acid sequence information for any records of the dataset by clicking on the appropriate UniProt code (Figure 1c).
As mentioned in 9, a relatively low percent of sequence identity both within individual phyla and between them was revealed by aligning the sequences of bacterial S1 proteins.
For S1 proteins containing one-domain, the highest percent of sequence identity within individual phyla belongs to the Actinobacteria phylum (58%), the smallest, to the Tenericutes phylum (25%). Other phyla in this group (S1 proteins containing one-domain) are mono-representatives. Between taxonomic phyla in this group, the percent of the sequence identity ranges from 10% (for example, Actinobacteria and Bacteroidetes; Tenericutes and Actinobacteria, etc.) to 18% (Bacteroidetes and Firmicutes) (http://oka.protres.ru:4200/protein/5eb71e488886fe5b65803db9/pid). For this group, Actinobacteria Amycolatopsis vancoresmycina andActinoplanes friuliensis (24%) have the highest percent of sequence identity. Thus, based on our data, it seems possible to classify the one-domain containing S1 proteins as a unique group of S1 proteins. The uniqueness of such proteins is also mentioned17.
In all studied phyla, only a few bacteria (0.8% of all sequences, Fig.1) were found containing two S1 domains (some bacteria from the Actinobacteria, Firmicutes, and Proteobacteria phyla). For S1 proteins containing two domains, the highest percent of sequence identity within individual phyla belongs to the Actinobacteria phylum (24 %), the smallest, to the Proteobacteria phylum (15%). Between taxonomic phyla in this group, the percent of sequence identity is 13% (Actinobacteria and Firmicutes; Proteobacteria and Firmicutes) and 15% (Actinobacteria and Proteobacteria) (http://oka.protres.ru:4200/protein/5eb71e4b07439c8b4d90c98a/pid). For this group, Actinobacteria Streptomyces rimosus andAmycolatopsis mediterranei have the highest percent of sequence identity (85%).
In all cases, the phylum Cyanobacteria (Terrabacteria superphylum) has three S1 domains; also some representatives of the Actinobacteria phyla (G+ Terrabacteria) and Proteobacteria (mono-representative) have three-domain S1 proteins. As a rule, three-domain S1 proteins are identified in 2% of cases (Figure 1). Within the Cyanobacteria phylum the percent of sequence identity is 38%. Between taxonomic phyla in this group, the percent of the sequence identity is 14% for Actinobacteria and Cyanobacteria and 15% for the Actinobacteria and Proteobacteria phyla and for the Proteobacteria and Cyanobacteria phyla. For this group, Cyanobacteria strains Microcystis aeruginosaTAIHU98 and Microcystis aeruginosa DIANCHI905 have the highest percent of sequence identity (99%).
Records with four S1 domains were identified in 33% cases of all ribosomal S1 proteins studied. Almost all analyzed bacteria in this group belong to the phyla Actinobacteria (52% of all four-domain proteins S1) and Firmicutes (45% of all four-domain proteins S1) (Figure 2). Phyla Bacteroidetes and Caldiserica are mono-representatives in this group. For S1 proteins containing four domains, the highest percent of sequence identity within individual phyla belongs to the Actinobacteria phylum (64%), the smallest to the Chloroflexi phylum (27%). Between taxonomic phyla in this group, the percent of sequence identity varies from 14% (for example, Proteobacteria and Planctomycetes) to 23% (Actinobacteria and Firmicutes). For this group, Actinobacteria strains Mycobacterium tuberculosis (strain ATCC 25618 / H37Rv) and Mycobacterium tuberculosis (strain CDC 1551 / Oshkosh) have the highest percent of sequence identity (100%).
Bacteria of the monotypic (consisting of one Deinococci class) phylum Deinococcus-Thermus, have always five S1 domains. Five S1 domains are also found in bacteria of the Synergistetes, Haloplasmatales, Verrucomicrobia, Proteobacteria, Planctomycetes and Chlamydiae phyla. As a rule, five-domain S1 proteins make up 1.2% of all studied ribosomal S1 proteins (Fig.1). For S1 proteins containing five domains, the highest percent of sequence identity within individual phyla belongs to the Deinococcus-Thermus phylum (53%) and the smallest to the Proteobacteria phylum (26%). Between taxonomic phyla in this group, the percent of sequence identity ranges from 14% (for example, Planctomycetes and Haloplasmatales) to 31% (Verrucomicrobia and Chlamydiae) (http://oka.protres.ru:4200/protein/5eb71e57aab9de6c16cb9d0d/pid). For this group, the phylum Deinococcus-Thermus Thermus parvatiensisand Thermus thermophiles have the highest percent of sequence identity (97%).
About 62% of the records were identified as proteins containing six S1 domains (Figure 1). As a rule, these proteins belong to the Proteobacteria phylum (86% of all six-domain S1 proteins). The distribution of taxonomic classes within the phylum Proteobacteria is 23%, 15%, 55%, 3%, and 4% for the alpha, beta, gamma, delta, and epsilon classes, respectively. Also ribosomal proteins S1 from bacteria of the phylum Chlorobi (green sulfur bacteria), Acidobacteria, Aquificae, Deferribacteres, Fibrobacteres, Fusobacteria, Gemmatimonadetes, Ignavibacteriae, Nitrospirae, Oligoflexia, Planctomycetes and Verrucomicrobia have six S1 domains. Gram-negative bacteria containing six S1 domains include the Spirochaetes, Bacteroidetes, Chlamidia phyla. For S1 proteins containing six domains, the highest percent of sequence identity within individual phyla belongs to the Chlamidia phylum (69%), the smallest to the Spirochaetes phylum (37%). Between taxonomic phyla in this group, the percent of the sequence identity ranges from 15% (Fusobacteria and Acidobacteria, Fusobacteria and Nitrospirae) to 45% (beta and gamma Proteobacteria). For S1 proteins containing six domains, bacteria from the phylum Chlamydiae Chlamydia trachomatis and Chlamydia muridarumhave the highest percent of sequence identity (95%).