Precision = TP/ (TP + FP) Eq 2
Precision is the proportion of the true positive instances predicted correctly by the classifier, with respect to all the data points.Recall = TP/(TP+FN) Eq 3Recall is the proportion of the true positive instances predicted correctly out of the true positive and false negative data points. In other words, the ability of a classifier to retrieve all appropriate examples is called recall (also known as sensitivity).F1 Score = 2*(Recall * Precision) / (Recall + Precision) Eq 4F1 scores include both precision and recall into their calculation, they are often regarded as inferior to accuracy metrics. It ranges from 0.0 to 1.0. To compare classifier models, it is recommended to utilise the weighted average of F1 rather than overall accuracy.Weighted average F1-score =\(\sum_{i=1}^{8}{\mathbf{F}\mathbf{1}\mathbf{-}\mathbf{score*(no.\ of\ instances))/8}}\)Eq 53
RESULTS AND DISCUSSION
3.1 Selection of the optimal window size based on the neural network model performances:
To address the objective of the study - prediction of multiple His modifications based on a given protein sequence - we initially attempted to optimize the target amino acid sequence length (variable window size from three to ten, as described in the method section). The results were shown for CNN model. The accuracy of CNN models follows a Gaussian distribution between window sizes three to ten (Table 4). The maximum accuracy on the training dataset was observed for window size seven (that is sequence length of 15 [2x7=+1] amino acids). Hence, window size seven was as selected default for subsequent model creation, training and validation.