3.3 Comparison of the results from the present study and the literature reports:
Currently, there are only a few His post-translational modification prediction tools reported in the literature, namely, pHisPred, iPhosH-PseAAC, Prospect and His-Cys metal binding prediction. All of these tools can predict one His function at a time. His-Cys metal binding prediction tool can predict metal-binding of His and Cys amino acids. The training data sets used to develop these prediction tools were small enough and the sizes were comparable to the dataset used in this study (Table 6). The internal prediction accuracies for iPhos-PseAAC, Prospect and pHisPred were 33%, 72% and 73% respectively. His-Cys metal binding sites (predicting two amino acids at a time) have reported 73% precision and 61% recall values. The best internal prediction accuracy was obtained from the current model, Hist-i-fy. However, there is a scope of improvement for the model performance upon availability of larger data sets. For external validation, we have tested the Hist-i-fy model on an independent dataset of histidine phosphorylation, generated from mass spectroscopy, sample size, 34. The prediction accuracy of the Hist-i-fy model on the test dataset was 94.1% only. To note, the training and the test datasets are independent of each other and the test dataset consists of only one modification, phosphorylation. Moreover, the training accuracy was a cumulative accuracy for all the modifications and the test accuracy was only for phosphorylation. Thus, the accuracy observed in the test dataset was higher than that in the training dataset. For comparison purpose, the same test dataset was used for histidine phosphorylation prediction using pHisPred tool. The prediction accuracy from pHisPred was 94.0, comparable to the results from Hist-i-fy. For the first-time we report prediction of eight histidine modifications from a given protein sequence, with a reasonably high accuracy.