Precision = TP/ (TP + FP) Eq 2
Precision is the proportion of the true positive instances predicted
correctly by the classifier, with respect to all the data points.Recall = TP/(TP+FN) Eq 3Recall is the proportion of the true positive instances predicted
correctly out of the true positive and false negative data points. In
other words, the ability of a classifier to retrieve all appropriate
examples is called recall (also known as sensitivity).F1 Score = 2*(Recall * Precision) / (Recall + Precision) Eq 4F1 scores include both precision and recall into their calculation, they
are often regarded as inferior to accuracy metrics. It ranges from 0.0
to 1.0. To compare classifier models, it is recommended to utilise the
weighted average of F1 rather than overall accuracy.Weighted average F1-score
=\(\sum_{i=1}^{8}{\mathbf{F}\mathbf{1}\mathbf{-}\mathbf{score*(no.\ of\ instances))/8}}\)Eq 53
RESULTS AND DISCUSSION
3.1 Selection of the optimal window size based on the neural
network model performances:
To address the objective of the study - prediction of multiple His
modifications based on a given protein sequence - we initially attempted
to optimize the target amino acid sequence length (variable window size
from three to ten, as described in the method section). The results were
shown for CNN model. The accuracy of CNN models follows a Gaussian
distribution between window sizes three to ten (Table 4). The maximum
accuracy on the training dataset was observed for window size seven
(that is sequence length of 15 [2x7=+1] amino acids). Hence, window
size seven was as selected default for subsequent model creation,
training and validation.