3.3 Comparison of the results from the present study and the
literature reports:
Currently, there are only a few His post-translational modification
prediction tools reported in the literature, namely, pHisPred,
iPhosH-PseAAC, Prospect and His-Cys metal binding prediction. All of
these tools can predict one His function at a time. His-Cys metal
binding prediction tool can predict metal-binding of His and Cys amino
acids. The training data sets used to develop these prediction tools
were small enough and the sizes were comparable to the dataset used in
this study (Table 6). The internal prediction accuracies for
iPhos-PseAAC, Prospect and pHisPred were 33%, 72% and 73%
respectively. His-Cys metal binding sites (predicting two amino acids at
a time) have reported 73% precision and 61% recall values. The best
internal prediction accuracy was obtained from the current model,
Hist-i-fy. However, there is a scope of improvement for the model
performance upon availability of larger data sets. For external
validation, we have tested the Hist-i-fy model on an independent dataset
of histidine phosphorylation, generated from mass spectroscopy, sample
size, 34. The prediction accuracy of the Hist-i-fy model on the test
dataset was 94.1% only. To note, the training and the test datasets are
independent of each other and the test dataset consists of only one
modification, phosphorylation. Moreover, the training accuracy was a
cumulative accuracy for all the modifications and the test accuracy was
only for phosphorylation. Thus, the accuracy observed in the test
dataset was higher than that in the training dataset. For comparison
purpose, the same test dataset was used for histidine phosphorylation
prediction using pHisPred tool. The prediction accuracy from pHisPred
was 94.0, comparable to the results from Hist-i-fy. For the first-time
we report prediction of eight histidine modifications from a given
protein sequence, with a reasonably high accuracy.