Abstract
Single amino acid variation (SAV) is an amino acid substitution of the
protein sequence and might influence the whole protein structure,
binding affinity, or functional domain and related to disease, even
cancer. However, to clarify the relationship between SAV and cancer
using traditional experiments is time and resource consuming. Though
there are some SAVs predicted methods using the computational approach,
most of them predict the protein stability changed caused by SAV. In
this work, all of the SAV characteristics generated from protein
sequences, structures, and micro-environment would be converted into
feature vectors and fed into an integrated predicting system by using
Support Vector Machine and genetic algorithm. The critical features were
used to estimate the relationship between their properties and cancer
caused by SAVs. In the results, we have developed a prediction system
based on protein sequence and structure, which could distinguish the SAV
is related to cancer or not, and the accuracy, the Matthews correlation
coefficient, and the F1-score yield to 90.88%, 0.77 and 0.83,
respectively. Moreover, an online prediction server called CanSavPre was
built (http://bioinfo.cmu.edu.tw/CanSavPre/), which will be a useful,
practical tool for cancer research and precision medicine.