4. Discussion
Almost one third of PMMs reported in Table 2, involves R (18.4%) and G (14.5%) with the remaining variants almost uniformly distributed among the other amino acids, see Fig. 1b. The peculiar characteristics of arginine and glycine have been invoked to explain this finding [27], as the former is prone to be replaced, despite its six different protecting codons, due to the high C-T and G-A transition probability observed for 5’-CpG dinucleotides [28]. However, data shown in Fig. 1a indicate that arginine is still very frequently encountered in BMMs (15.1%), while glycine occurrence reaches only a rather average value (6.5%), suggesting that different mechanisms contribute to their pathogenicity.
A comparative topological analysis of BMMs and PMMs, summarized in Table 3, clearly indicates that arginine mutations increase pathogenicity whenever they occur at PISA-defined interfaces and, particularly, at protein-DNA interfaces. In the protein-DNA interface, indeed, arginine is more than six times more frequent in PMM than BMM. This feature is in total agreement with the very critical role that this amino acid has in the interaction with nucleic acids [29]. Moreover, arginine PMM tends not to stay in buried protein moieties or in protein-protein interfaces, whereas glycine PMMs exhibit the opposite trend. It is interesting to note that the latter glycyl mutations are well above the average more frequently found in protein-ligand interfaces, in agreement with the suggested role of this amino acid to stabilize concave moieties of the protein surface [30]. Prevalent localization of pathological glycine mutations indicates that its replacement with amino acids bearing larger side chains causes structural stress and, hence, functional changes in mutated proteins.
The fact that among BMMs there are also three cases of arginine substitutions at the protein-DNA interface, seems to contradict the relevance of this amino acid in the latter interface. Hence, we have manually checked the structural features of these three BMMs that are associated with two transcription regulators, ZFP568 [31] and DUX4 [32], structurally resolved in PDB ID: 5V3J and PDB ID: 5ZFZ respectively. We have used the two PDB structures to generate the R98/Q mutant structure in ZFP568, and R411/Q and R599/H mutant structures of DUX4. Fig. 2 shows how R/Q and R/H replacements can maintain protein-DNA binding with the glutamyl amide group and with the histidyl imidazole group. It is important to note that in both cases the original arginine duty was not to keep these two proteins tightly bound to DNA, as it would be needed in the case of histones, being transcription regulators rather mobile proteins along DNA trails.
Thus, we have used the large array of items contained in the ClinVar database for generating maps of amino acid replacements, confirming that arginine and glycine are the most involved protein residues in missense mutations. As expected, by comparing BMMs and PMMs, we have also proved that amino acid similarity plays a significant role in determining pathogenicity. With the present Structural Bioinformatics approach, by using PISA as a protein interface analyzer, we have searched at an atomic resolution those features that are responsible for pathogenic mutations. Arginine and glycine, the most frequently involved in PMMs, resulted as representatives of two different mechanisms of pathogenicity. Arginine replacements, indeed, resulted to be pathogenic when they involve interaction processes and glycine substitutions can be deleterious whenever they can determine structural stresses in mutated proteins. In the edgotype view of missense mutation effects [14], arginine perturbs network edges and glycine modifies its nodes.
Structural characterization of PMMs can be expanded outside the current limits of the PISA database, by implementing algorithms that can work on reliably predicted structures for the advancement of genomic medicine.