Biological Relevance of the model

The top features obtained using mutual information gain shows that cysteine-phenylalanine, leucine-leucine, lysine-cysteine were the best dipeptide feature and amino acid frequencies of valine, serine, arginine, asparagine, histidine, cysteine, alanine were the top features which are used for the model and gave a good accuracy for our problem set.

Code repository for the challenge

The code along with the amino-acid and dipeptide frequency is being hosted at- https://github.com/souravsingh/Ideation-Challenge