3.4 Availability of Hist-i-fy:
The source code for Hist-i-fy is available athttps://github.com/dibyansu24-maker/Histifyand can be easily operated on any platform, following the instructions on GitHub. Moreover, Hist-i-fy enables users to train the model with the user-defined data.
Hist-i-fy prediction server is hosted onhttps://histify.streamlit.app/using Streamlit, a powerful Python library for building interactive web applications. Through the intuitive and user-friendly interface of Streamlit, users can predict sequence modifications in two modes: single sequence or multiple sequences (to be uploaded as a CSV file containing sequences and Histidine residue numbers). Streamlit seamlessly integrates with GitHub, facilitating the hosting process and ensuring a smooth user experience. The Hist-i-fy model processes the input data and generates predictions for the respective sequences, providing users with comprehensive insights into their sequence modifications.
4. CONCLUSION:
The functional characterization of the proteins and their amino acids lag behind the protein sequence determination. Experimental characterization is time-consuming and laborious, those can be complemented by computational methods. Histidine being one of the most important amino acid at the enzyme active site, functional characterization would potentially address many biological problems. His undergoes multiple modifications, sixteen such was reported in the UNIPROT database. However, only a handful of His (single) function prediction tools are available. Objective of this study was to predict multiple histidine modifications (functions) based on protein sequences. Here we trained and validated eight histidine modifications using Convoluted Neural Network model. The training dataset was curated from UNIPORT database. The overall accuracy produced by the CNN model was 75%, although, prediction of individual modifications varies, depending on the data size, existing sequence pattern etc. The external validity of the model was tested on an independent phosphorylation dataset obtained from proteomics study. Accuracy of phosphorylation prediction (external validation) was 94.1% much higher than that of accuracy (including all the eight modifications) from internal validation. The external validity result was comparable from existing His phosphorylation prediction tool - pHisPred. The final CNN model (termed as Hist-i-fy) is publicly available as a web application and a stand-alone program.5. REFERENCE:
[1] M. Cui, C. Cheng, and L. Zhang, “High-throughput proteomics: a methodological mini-review,” Lab. Investig. J. Tech. Methods Pathol., vol. 102, no. 11, pp. 1170–1181, Nov. 2022, doi: 10.1038/s41374-022-00830-7.
[2] T. K. Harris and G. J. Turner, “Structural Basis of Perturbed pKa Values of Catalytic Groups in Enzyme Active Sites,” IUBMB Life, vol. 53, no. 2, pp. 85–98, 2002, doi: 10.1080/15216540211468.
[3] A. Gutteridge and J. M. Thornton, “Understanding nature’s catalytic toolkit,”Trends Biochem. Sci., vol. 30, no. 11, pp. 622–629, Nov. 2005, doi: 10.1016/j.tibs.2005.09.006.
[4] A. Bhatnagar and D. Bandyopadhyay, “Characterization of cysteine thiol modifications based on protein microenvironments and local secondary structures,”Proteins, vol. 86, no. 2, pp. 192–209, Feb. 2018, doi: 10.1002/prot.25424.
[5] V. Nallapareddy, S. Bogam, H. Devarakonda, S. Paliwal, and D. Bandyopadhyay, “DeepCys: Structure-based multiple cysteine function prediction method trained on deep neural network: Case study on domains of unknown functions belonging to COX2 domains,” Proteins, vol. 89, no. 7, pp. 745–761, Jul. 2021, doi: 10.1002/prot.26056.
[6] A. Bhatnagar, M. I. Apostol, and D. Bandyopadhyay, “Amino acid function relates to its embedded protein microenvironment: A study on disulfide-bridged cystine,” Proteins Struct. Funct. Bioinforma., vol. 84, no. 11, pp. 1576–1589, 2016, doi: 10.1002/prot.25101.
[7] Z. Chenet al., “PROSPECT: A web server for predicting protein histidine phosphorylation sites,” J. Bioinform. Comput. Biol., vol. 18, Mar. 2020, doi: 10.1142/S0219720020500183.
[8] J. Zhaoet al., “pHisPred: a tool for the identification of histidine phosphorylation sites by integrating amino acid patterns and properties,” BMC Bioinformatics, vol. 23, Sep. 2022, doi: 10.1186/s12859-022-04938-x.
[9] A. Passerini, M. Punta, A. Ceroni, B. Rost, and P. Frasconi, “Identifying cysteines and histidines in transition-metal-binding sites using support vector machines and neural networks,” Proteins Struct. Funct. Bioinforma., vol. 65, no. 2, pp. 305–316, 2006, doi: 10.1002/prot.21135.
[10] S. Lc and M. M, “Using Peptide Arrays To Discover the Sequence-Specific Acetylation of the Histidine-Tyrosine Dyad,” Biochemistry, vol. 58, no. 13, Apr. 2019, doi: 10.1021/acs.biochem.9b00022.
[11] S. Larsenet al., “Mapping Physiological ADP-Ribosylation Using Activated Ion Electron Transfer Dissociation,” Cell Rep., vol. 32, p. 108176, Sep. 2020, doi: 10.1016/j.celrep.2020.108176.
[12] H. Minneeet al., “Mimetics of ADP-ribosylated histidine through copper(I)-catalyzed click chemistry,” Org. Lett., vol. 24, no. 21, pp. 3776–3780, May 2022, doi: 10.1021/acs.orglett.2c01300.
[13] D. Dutta, C. Mandal, and C. Mandal, “Unusual glycosylation of proteins: Beyond the universal sequon and other amino acids,” Biochim. Biophys. Acta BBA - Gen. Subj., vol. 1861, no. 12, pp. 3096–3108, Dec. 2017, doi: 10.1016/j.bbagen.2017.08.025.
[14] G. Zurlo, J. Guo, M. Takada, W. Wei, and Q. Zhang, “New Insights into Protein Hydroxylation and Its Important Role in Human Diseases,” Biochim. Biophys. Acta, vol. 1866, no. 2, pp. 208–220, Dec. 2016, doi: 10.1016/j.bbcan.2016.09.004.
[15] S. Markolovic, S. E. Wilkins, and C. J. Schofield, “Protein Hydroxylation Catalyzed by 2-Oxoglutarate-dependent Oxygenases,” J. Biol. Chem., vol. 290, no. 34, pp. 20712–20722, Aug. 2015, doi: 10.1074/jbc.R115.662627.
[16] M. E. Jakobsson, “Enzymology and significance of protein histidine methylation,” J. Biol. Chem., vol. 297, no. 4, Oct. 2021, doi: 10.1016/j.jbc.2021.101130.
[17] C.-F. Xuet al., “Discovery and Characterization of Histidine Oxidation Initiated Cross-links in an IgG1 Monoclonal Antibody,” Anal. Chem., vol. 89, no. 15, pp. 7915–7923, Aug. 2017, doi: 10.1021/acs.analchem.7b00860.
[18] C. Schöneich, “Reactive oxygen species and biological aging: a mechanistic approach,” Exp. Gerontol., vol. 34, no. 1, pp. 19–34, Jan. 1999, doi: 10.1016/S0531-5565(98)00066-7.
[19] J. D. Bridgewater, R. Srikanth, J. Lim, and R. W. Vachet, “The Effect of Histidine Oxidation on the Dissociation Patterns of Peptide Ions,”J. Am. Soc. Mass Spectrom., vol. 18, no. 3, pp. 553–562, Mar. 2007, doi: 10.1016/j.jasms.2006.11.001.
[20] J. Zhaoet al., “HisPhosSite: A comprehensive database of histidine phosphorylated proteins and sites,” J. Proteomics, vol. 243, p. 104262, Jul. 2021, doi: 10.1016/j.jprot.2021.104262.
[21] P. M. Wolanin, P. A. Thomason, and J. B. Stock, “Histidine protein kinases: key signal transducers outside the animal kingdom,” Genome Biol., vol. 3, no. 10, p. reviews3013.1-reviews3013.8, 2002, doi: 10.1186/gb-2002-3-10-reviews3013.
[22] Z. Duet al., “Highly Conserved Histidine Plays a Dual Catalytic Role in Protein Splicing: A pKa Shift Mechanism,” J. Am. Chem. Soc., vol. 131, no. 32, pp. 11581–11589, Aug. 2009, doi: 10.1021/ja904318w.
[23] J. R. Herrmann, J. C. Panitz, S. Unterreitmeier, A. Fuchs, D. Frishman, and D. Langosch, “Complex Patterns of Histidine, Hydroxylated Amino Acids and the GxxxG Motif Mediate High-affinity Transmembrane Domain Interactions,” J. Mol. Biol., vol. 385, no. 3, pp. 912–923, Jan. 2009, doi: 10.1016/j.jmb.2008.10.058.
[24] M. Lvet al., “METTL9 mediated N1-histidine methylation of zinc transporters is required for tumor growth,” Protein Cell, vol. 12, no. 12, pp. 965–970, Dec. 2021, doi: 10.1007/s13238-021-00857-4.
[25] L. D. L. Jedlicka et al., “Increased chemical acetylation of peptides and proteins in rats after daily ingestion of diacetyl analyzed by Nano-LC-MS/MS,” PeerJ, vol. 6, p. e4688, Apr. 2018, doi: 10.7717/peerj.4688.
[26] K. Terashimaet al., “Impurity effects on electron–mode coupling in high-temperature superconductors,” Nat. Phys., vol. 2, no. 1, Art. no. 1, Jan. 2006, doi: 10.1038/nphys200.
[27] The UniProt Consortium, “UniProt: the universal protein knowledgebase in 2021,”Nucleic Acids Res., vol. 49, no. D1, pp. D480–D489, Jan. 2021, doi: 10.1093/nar/gkaa1100.
[28] C. M. Potel, M.-H. Lin, A. J. R. Heck, and S. Lemeer, “Widespread bacterial protein histidine phosphorylation revealed by mass spectrometry-based proteomics,” Nat. Methods, vol. 15, no. 3, pp. 187–190, Mar. 2018, doi: 10.1038/nmeth.4580.
[29] “Tokenization and Text Data Preparation with TensorFlow & Keras,”KDnuggets. https://www.kdnuggets.com/tokenization-and-text-data-preparation-with-tensorflow-keras.html (accessed Apr. 14, 2023).
[30] “sklearn.preprocessing.LabelBinarizer,” scikit-learn. https://scikit-learn/stable/modules/generated/sklearn.preprocessing.LabelBinarizer.html (accessed Apr. 14, 2023).