Introduction
Three dimensional structures of proteins and other macromolecules are
often required for detailed studies and for understanding of mechanisms
and functions. Although new structures are determined at fast pace, the
gap between known sequences and structures is widening (Kc, 2017). Due
to low cost and availability of sequencing facilities, numbers of genes
and genomes of novel organisms are expanding. Structures are still
missing for many proteins even in well-studied organisms. For example,
structures for a large part of human proteome are not yet available.
Protein modelling may be a useful alternative when experimental
structure does not exist. Models and the process of predicting them is
often poorly documented in scientific literature. Unfortunately, this
applies to descriptions of experimental studies as well. Analysis of 268
biomedical publications with experimental data revealed that only one
article (0.37%) included full report of protocols (Iqbal et al., 2016).
This paper provides guidelines for detailed and proper reporting of
modelling studies.
Protein structures can be modelled by using methods from three major
categories of tools, for a review see (Dorn et al., 2014; Kuhlman and
Bradley, 2019). Ab initio methods predict structures from scratch
based on the most favorable energy conformations. These methods require
extensive computational resources. The goal for fold recognition (also
called threading) methods is to reveal the folding type of the protein
of interest. Homology modelling based on a related, known structure or
several ones provides often the most reliable atom level models. All
these approaches are complicated and include several steps. Full
description of these studies is an exception rather than a norm.
Inadequate descriptions prevent full comprehension of the models,
investigating their quality and evaluation, extension and repetition to
new studies. Proper reporting would allow readers to pick problematic
cases and details if peer-review has failed in detecting the
deficiencies. In recent years scientific communities have awakened to
reproducibility (repeatability) crisis (Baker, 2016; Begley and
Ioannidis, 2015), one reason for which is inadequate description of
studies.
Strategies and tips have been published for how to model structure
(Dhingra et al., 2020; Dorn et al., 2014; Haddad et al., 2020), however,
there are not instructions for description of protein models despite a
recognized need (see e.g. (Schwede et al., 2009)). The only available
guidelines are for small molecules related to medicinal chemistry,
mainly drugs (Gund et al., 1988). Related recommendations and guidelines
have been published in bioinformatics e.g. for computer-aided variation
interpretation (Vihinen, 2012, 2013) and for sequence alignments
(Vihinen, 2020). Minimum Information About Bioinformatics Investigation
(MIABI, https://fairsharing.org/FAIRsharing.28yec8) (Tan et al., 2010)
provides basic reporting guidelines including the used algorithm,
analysis protocol, used databases, resources, software and
(web)services. However, more detailed data is needed for structural
models.
The goal in here is to provide guidelines for reporting use and results
from molecular modelling. The guidelines originate from frustration in
reading published articles and manuscripts that do not contain
sufficient details to allow reader to comprehend and evaluate modelling
process and quality of the output and validity of made predictions e.g.
in relation to function or diseases. These guidelines are simple to
follow and apply. Modelling studies should be described with similar
degree of detail as experimental articles. As many journals do not allow
reporting full methodological details in the main article, provide
details in supplementary material and other parts of the article. Many
details can be included also to tables, figures and figure captions.