The Whole is Greater than its Parts -- Ensembling Improves Protein Contact Prediction

Wendy Billings; Connor Morris; Dennis Della Corte

doi:10.22541/au.160317361.15075213/v1

loading page

The Whole is Greater than its Parts -- Ensembling Improves Protein Contact Prediction

Wendy Billings,
Connor Morris,
Dennis Della Corte

Abstract

The prediction of amino acid contacts from protein sequence is an important problem, as protein contacts are a vital step towards the prediction of folded protein structures. We propose that a powerful concept from deep learning, called ensembling, can increase the accuracy of protein contact predictions by combining the outputs of different neural network models. We show that ensembling the predictions made by different groups at the recent Critical Assessment of Protein Structure Prediction (CASP13) outperforms all individual groups. Further, we show that contacts derived from the distance predictions of three additional deep neural networks – AlphaFold, trRosetta, and ProSPr – can be substantially improved by ensembling all three networks. In a final assessment, we show that ensembling these recent deep neural networks with the best CASP13 group creates a superior contact prediction tool. In order to build novel methods on these findings, we propose the creation of a better protein contact benchmark set and additional open-source contact prediction methods.

13 Apr 2021Published in Scientific Reports volume 11 issue 1. 10.1038/s41598-021-87524-0

Abstract

Peer review status:Published