Figure 4: An example of GNINA docking results on the Hsp90 receptor in complex with ligand 9J0 (PDB ID: 5ZR3). The crystal structure of the receptor is shown in purple with the ground truth ligand in green. The AlphaFold model is shown in gray. The GNINA docked conformation using the AlphaFold model as input is in white and the docked conformation using the crystal structure as input is in orange.
While these results have proven valuable for testing our automated workflow, they are not meant to be a comprehensive evaluation of these PLC prediction tools especially in the context of the challenges and concepts discussed in the previous sections. In the small set of 363 PLC used: (1) 108 protein-ligand pairs have peptide and oligosaccharide ligands which are not ideal as most docking tools are not calibrated for these types of ligands31. (2) Only 104 out of the remaining 255 small molecule and ion pockets pass the relaxed validation criteria, and (3) the test set was created using a time-based split and thus contains redundant proteins within itself, indicating a biased representation of PLC space, as well as with the PDBBind training set, indicating an overestimation of prediction results for the tools trained on this set. Thus, it is critical to repeat this analysis on a diverse benchmarking dataset created with both structure quality and PLC diversity taken into account, and after ensuring that the PLC prediction tools based on machine learning or deep learning are trained on a dataset different from the benchmark set. This will both ensure a more reliable and comprehensive evaluation as well as allow for more specific pinpointing of problem cases for different tools to aid in their further development.
For four out of the 363 complexes the workflow failed due to issues with various steps in the process. The inability to generate conformers using RDKit for the stapled peptide ligand of 6q4q resulted in the failure of both DiffDock inference and the definition of the search box required to run Autodock Vina, SMINA, and GNINA. For the 6o0h protein-ligand pair DiffDock failed because the language model embeddings did not have the right length for the protein. In addition, 6uhu and 6rtn failed to run with Autodock Vina due to the presence of unsupported atoms. Furthermore, for the 6d07 receptor, P2Rank was unable to predict a binding pocket. During the analysis of the 256 AlphaFold modeled receptors, P2Rank failed to predict a binding pocket for three receptors (6d07, 6d08 and 6qlt). Further, complexes 6o0h and 6uhu suffered the same issues already mentioned above. In addition, DiffDock inference failed for three more complexes (6cjj, 6jib, and 6jid). These failures were automatically identified,reported and isolated by the workflow. Overall, we demonstrate that automated workflows can be employed for PLC preparation, prediction and assessment.