Logo motifs
The logo motifs in each study group (by the number of structural domains) were analyzed with alignment of the S1 protein sequences. An example of Logo motif profiles for S1 proteins containing two domains is shown in Figure 3 (Logo analysis motif, http://oka.protres.ru:4200). Moreover, the user can create logo profiles for full-size sequences using the S1 server.
Some specific patterns were identified when considering the analysis of the S1 protein profiles. Thus, the most conserved are the sequence regions corresponding to β-strands, which correlate with our earlier data on the high conservatism of the secondary structure in such proteins 38. In addition, an increase in the number of structural S1 domains correlates with an increase in conservatism within each individual domain. Proteins containing five domains are an exception, possibly due to the small sequence representation.
As shown in 9, single domain S1 proteins have a not very high percentage of identity with each other (27%). The strict presence of conserved residues F19, F22, H34, N64, and R6839, which form RNA binding site in other bacterial, archeal, and eukaryotic protein containing the S1 domain7 was not revealed taking into account analysis of the logo motif of this group (http://oka.protres.ru:4200/protein/5eb71e488886fe5b65803db9/logo). For this group, residues F19, F22, and R68 are conserved only for some bacteria. At the same time, as is known, single-domain S1 proteins of parasitic bacteria of the Mollicutes class (the Tenericutes phylum) effectively perform the main RNA-binding function 40. It is possible that for these bacteria the RNA binding site is formed by specific amino acid residues or the RNA binding mechanism differs from other proteins containing the S1 domain.
The first and second domains in S1 proteins, containing two structural domains, also have a low percentage of identity within domains: 27% and 30%, respectively. The first and the second domains from S1 proteins containing two structural domains have 38% identity, while pairs with the maximum and minimum values of identity have been identified for the remaining domains 9. For the first domain in this group, F19, F22 and R68 residues of the RNA binding site are conserved. F19 and H34 are conserved residues for the second domain in this group (Figure 3a).
For S1 proteins containing three structural domains, the maximal value of identity was found between the first and third domains (53%) and the minimum value between the first and the second domains (42%). Moreover, the third domain has the maximum percentage of identity (57%) among other domains for this group 9. For the first domain in this group of bacteria, N64 residue of the RNA-binding site is conserved. N64, R68, and R34 (at the position of the conserved residue H34) seem to form the RNA-binding site of the second domain in the three-domain containing bacterial S1 proteins. F19, H34, and R68 residues are conserved for the third domain. It can be assumed that for this group, the first domain is characterized by a lower degree of RNA binding efficiency.
For S1 proteins containing four structural domains, the maximum identity value was found between the third and fourth domains (78%) and the minimum identity value between the second and third domains. The third domain also has also the maximum percentage of homology (66%) among other domains in this group. F19, F22, H34 and R68 residues are highly conserved for this domain. These residues are also conserved for the fourth S1 domain in this group. For the second domain F22, N64, and R68 residues formed an RNA binding site. For the first domain, only R34 residue (at the position of the conserved H34 residue) is retained.
The third and fourth domains in the group of S1 proteins containing five structural domains have the maximum percentage of identity (66%), while the second and fifth domains have the lowest percentage of identity (43%). In this group, the fourth domain has the maximum percentage of identity among other domains (49%) 9. The first domain has no specific conserved motif residues; for the second domain, only R68 residue from the RNA-binding site is retained. Despite the small representativeness of the sequence of bacteria of this group, for the remaining three domains F19, F22, H34 and R68 residues apparently form an RNA binding site.
For the most abundant S1 proteins containing six structural domains, as well as, for S1 proteins with four and five domains, the maximum values of identity are determined between the third and fourth domains (71%) and the minimum values are between the first and the second (39%). The third domain has the highest percentage of identity among other domains in this group (68%) 9. For this domain, the RNA binding site is formed by five residues: F19, L22 (conserved for F22), H34, N64, and R68 (Figure 3b). The first and sixth domains have no specific conserved residues that can form a RNA binding site. For the second domain, F22, N64, and R68 are retained. Four residues, F19, L22 (in the position of the conserved F22 residue), H34 and R68 are specific for the fourth domain in this group (Fig 3b). the obtained data are in a good agreement with the experimental data confirming that cutting off one S1 domain from the C-terminus or two S1 domains from the N-terminus of the protein reduces only the efficiency of the protein functions, but not its functional capabilities 14,41.