2.1.3 Glycosylation
Different amino acids may undergo different types of glycosylation, namely, C-linked, N-linked, O-linked, or S-linked [13]. Histidine undergoes N-linked glycosylation.
2.1.4 Hydroxylation
Protein hydroxylation, a post-translational modification, is carried out by 2-oxoglutarate-dependent dioxygenases. This post-translational modification can be induced by hypoxia-induced-factor alpha (HIF-a) on proline [14]. Hydroxylation may also involve protein-protein interactions and downstream signalling. Apart from proline, lysine, asparagine, aspartate, and histidine can also undergo hydroxylation modification [15].
2.1.5 Methylation
The actin and myosin proteins undergo the post-translational modification (PTM) of histidine methylation. There are two different locations where it can happen: 1-methyl histidine (1MeH) and 3-methyl histidine (3MeH) [16].
2.1.6 Oxidation
Under unusual or stressed conditions histidine undergoes oxidation to 2-oxo-histidine (2-oxo-His) (Figure 1). Photo-induced oxidation of Histidine leading to various cross-links, including intact His, Lys and Cys, was observed in high-molecular weight (HMW) fractions of monoclonal anti-bodies [17]. Oxidation of His residue is also observed in proteins from cells undergoing oxidative stress [18]. The 2-oxo-His changes the dissociation pattern of peptide ions in Mass-spectroscopy studies [19].
2.1.7 Phosphorylation
His phosphorylation is crucial step in various cellular processes, such as signal transduction, cell cycle, proliferation, differentiation, and apoptosis, Phosphorylated His contributes 6% to all the phosphorylated amino acids. However, phosphorylation of His is less explored compared to phosphorylated serine, threonine and tyrosine. Recently a consolidated database on phosphorylated His (HisPhosSite) is available [20]. Histidine Kinase (HK)s is one of the classical non-animal kingdom kinases that phosphorylate His, although, in a 2-step manner - i) transfer phosphate from ATP to His and ii) then transfer the phosphate to an aspartate residue [21].
2.1.8 Protein Splicing
Protein splicing is triggered via acid-base catalysis that involves multiple conserved His at the active site. Histidine probably plays dual role in protein splicing, first as a general base to start acyl shift splicing and next as a general acid to break the scissile bond at the N-terminal splicing junction [22].
2.2 Sequence signatures around different His post-translational modifications:
Many of the His post-translational modifications were identified with specific sequence signatures or motifs. For example, His hydroxylation motif is a part of Hydrogen-bond (H-bond) cluster that is brought into the register by GXXG motif [23]. For His methylation, the common motif observed in short methylated peptides was GHXHXH [24]. Histidine acetylation motif deduced from mass spectrometry data based on diacetyl-fed rat lung proteins was GXPGXXGHXGXXG [25]. However, some of the Histidine post-translation modifications do not carry sequence signatures. For example, no specific sequence motif is reported for His glycosylation. For His phosphorylation, no clear sequence motif was identified [26].
2.3 Training dataset generation for Histidine post-translational modifications
There are eight His post-translational modifications (Figure 1) annotated in this work based on the availability of protein sequences from the UniProt database [27]. From the “Keyword” subsection of UniProt, category name ”PTM” was selected to track all possible post-translational modifications. The text filters (not case sensitive) – “His”, or “Histidine” were used to identify the experimentally annotated His functions from the PTM category, curated on November 2022. A total of sixteen modifications were identified, some of those have very few data points. Finally, eight modifications were selected for the training dataset with a number of data points more than or equal to twenty (Table 1).