Percent Sequence Identity of bacterial ribosomal S1 proteins
For all available bacterial ribosomal S1 sequences, we collected and
calculated the percent identity for each phyla and group (according to
the number of structural domains) and the results of pairwise alignment
within the bacterial phyla. The presented data are integrated into the
server, which can be accessed at http://oka.protres.ru:4200(Analysis PID). An example of a PID analysis of four-domain containing
S1 proteins is given in Figure 2a. An example of a PID analysis for
five-domain containing S1 proteins within the Deinococcus-Thermus phylum
is shown in Figure 2b. The S1 server also allows the user to obtain
amino acid sequence information for any records of the dataset by
clicking on the appropriate UniProt code (Figure 1c).
As mentioned in 9, a relatively low percent of
sequence identity both within individual phyla and between them was
revealed by aligning the sequences of bacterial S1 proteins.
For S1 proteins containing one-domain, the highest percent of sequence
identity within individual phyla belongs to the Actinobacteria phylum
(58%), the smallest, to the Tenericutes phylum (25%). Other phyla in
this group (S1 proteins containing one-domain) are mono-representatives.
Between taxonomic phyla in this group, the percent of the sequence
identity ranges from 10% (for example, Actinobacteria and
Bacteroidetes; Tenericutes and Actinobacteria, etc.) to 18%
(Bacteroidetes and Firmicutes)
(http://oka.protres.ru:4200/protein/5eb71e488886fe5b65803db9/pid). For
this group, Actinobacteria Amycolatopsis vancoresmycina andActinoplanes friuliensis (24%) have the highest percent of
sequence identity. Thus, based on our data, it seems possible to
classify the one-domain containing S1 proteins as a unique group of S1
proteins. The uniqueness of such proteins is also mentioned17.
In all studied phyla, only a few bacteria (0.8% of all sequences,
Fig.1) were found containing two S1 domains (some bacteria from the
Actinobacteria, Firmicutes, and Proteobacteria phyla). For S1 proteins
containing two domains, the highest percent of sequence identity within
individual phyla belongs to the Actinobacteria phylum (24 %), the
smallest, to the Proteobacteria phylum (15%). Between taxonomic phyla
in this group, the percent of sequence identity is 13% (Actinobacteria
and Firmicutes; Proteobacteria and Firmicutes) and 15% (Actinobacteria
and Proteobacteria)
(http://oka.protres.ru:4200/protein/5eb71e4b07439c8b4d90c98a/pid). For
this group, Actinobacteria Streptomyces rimosus andAmycolatopsis mediterranei have the highest percent of sequence
identity (85%).
In all cases, the phylum Cyanobacteria (Terrabacteria superphylum) has
three S1 domains; also some representatives of the Actinobacteria phyla
(G+ Terrabacteria) and Proteobacteria (mono-representative) have
three-domain S1 proteins. As a rule, three-domain S1 proteins are
identified in 2% of cases (Figure 1). Within the Cyanobacteria phylum
the percent of sequence identity is 38%. Between taxonomic phyla in
this group, the percent of the sequence identity is 14% for
Actinobacteria and Cyanobacteria and 15% for the Actinobacteria and
Proteobacteria phyla and for the Proteobacteria and Cyanobacteria phyla.
For this group, Cyanobacteria strains Microcystis aeruginosaTAIHU98 and Microcystis aeruginosa DIANCHI905 have the
highest percent of sequence identity (99%).
Records with four S1 domains were identified in 33% cases of all
ribosomal S1 proteins studied. Almost all analyzed bacteria in this
group belong to the phyla Actinobacteria (52% of all four-domain
proteins S1) and Firmicutes (45% of all four-domain proteins S1)
(Figure 2). Phyla Bacteroidetes and Caldiserica are mono-representatives
in this group. For S1 proteins containing four domains, the highest
percent of sequence identity within individual phyla belongs to the
Actinobacteria phylum (64%), the smallest to the Chloroflexi phylum
(27%). Between taxonomic phyla in this group, the percent of sequence
identity varies from 14% (for example, Proteobacteria and
Planctomycetes) to 23% (Actinobacteria and Firmicutes). For this group,
Actinobacteria strains Mycobacterium tuberculosis (strain ATCC
25618 / H37Rv) and Mycobacterium tuberculosis (strain CDC 1551 /
Oshkosh) have the highest percent of sequence identity (100%).
Bacteria of the monotypic (consisting of one Deinococci class) phylum
Deinococcus-Thermus, have always five S1 domains. Five S1 domains are
also found in bacteria of the Synergistetes, Haloplasmatales,
Verrucomicrobia, Proteobacteria, Planctomycetes and Chlamydiae phyla. As
a rule, five-domain S1 proteins make up 1.2% of all studied ribosomal
S1 proteins (Fig.1). For S1 proteins containing five domains, the
highest percent of sequence identity within individual phyla belongs to
the Deinococcus-Thermus phylum (53%) and the smallest to the
Proteobacteria phylum (26%). Between taxonomic phyla in this group, the
percent of sequence identity ranges from 14% (for example,
Planctomycetes and Haloplasmatales) to 31% (Verrucomicrobia and
Chlamydiae)
(http://oka.protres.ru:4200/protein/5eb71e57aab9de6c16cb9d0d/pid). For
this group, the phylum Deinococcus-Thermus Thermus parvatiensisand Thermus thermophiles have the highest percent of sequence
identity (97%).
About 62% of the records were identified as proteins containing six S1
domains (Figure 1). As a rule, these proteins belong to the
Proteobacteria phylum (86% of all six-domain S1 proteins). The
distribution of taxonomic classes within the phylum Proteobacteria is
23%, 15%, 55%, 3%, and 4% for the alpha, beta, gamma, delta, and
epsilon classes, respectively. Also ribosomal proteins S1 from bacteria
of the phylum Chlorobi (green sulfur bacteria), Acidobacteria,
Aquificae, Deferribacteres, Fibrobacteres, Fusobacteria,
Gemmatimonadetes, Ignavibacteriae, Nitrospirae, Oligoflexia,
Planctomycetes and Verrucomicrobia have six S1 domains. Gram-negative
bacteria containing six S1 domains include the Spirochaetes,
Bacteroidetes, Chlamidia phyla. For S1 proteins containing six domains,
the highest percent of sequence identity within individual phyla belongs
to the Chlamidia phylum (69%), the smallest to the Spirochaetes phylum
(37%). Between taxonomic phyla in this group, the percent of the
sequence identity ranges from 15% (Fusobacteria and Acidobacteria,
Fusobacteria and Nitrospirae) to 45% (beta and gamma Proteobacteria).
For S1 proteins containing six domains, bacteria from the phylum
Chlamydiae Chlamydia trachomatis and Chlamydia muridarumhave the highest percent of sequence identity (95%).