Biological Relevance of the model
The top features obtained using mutual information gain shows that cysteine-phenylalanine, leucine-leucine, lysine-cysteine were the best dipeptide feature and amino acid frequencies of valine, serine, arginine, asparagine, histidine, cysteine, alanine were the top features which are used for the model and gave a good accuracy for our problem set.
Code repository for the challenge