Introduction
Three dimensional structures of proteins and other macromolecules are often required for detailed studies and for understanding of mechanisms and functions. Although new structures are determined at fast pace, the gap between known sequences and structures is widening (Kc, 2017). Due to low cost and availability of sequencing facilities, numbers of genes and genomes of novel organisms are expanding. Structures are still missing for many proteins even in well-studied organisms. For example, structures for a large part of human proteome are not yet available. Protein modelling may be a useful alternative when experimental structure does not exist. Models and the process of predicting them is often poorly documented in scientific literature. Unfortunately, this applies to descriptions of experimental studies as well. Analysis of 268 biomedical publications with experimental data revealed that only one article (0.37%) included full report of protocols (Iqbal et al., 2016). This paper provides guidelines for detailed and proper reporting of modelling studies.
Protein structures can be modelled by using methods from three major categories of tools, for a review see (Dorn et al., 2014; Kuhlman and Bradley, 2019). Ab initio methods predict structures from scratch based on the most favorable energy conformations. These methods require extensive computational resources. The goal for fold recognition (also called threading) methods is to reveal the folding type of the protein of interest. Homology modelling based on a related, known structure or several ones provides often the most reliable atom level models. All these approaches are complicated and include several steps. Full description of these studies is an exception rather than a norm. Inadequate descriptions prevent full comprehension of the models, investigating their quality and evaluation, extension and repetition to new studies. Proper reporting would allow readers to pick problematic cases and details if peer-review has failed in detecting the deficiencies. In recent years scientific communities have awakened to reproducibility (repeatability) crisis (Baker, 2016; Begley and Ioannidis, 2015), one reason for which is inadequate description of studies.
Strategies and tips have been published for how to model structure (Dhingra et al., 2020; Dorn et al., 2014; Haddad et al., 2020), however, there are not instructions for description of protein models despite a recognized need (see e.g. (Schwede et al., 2009)). The only available guidelines are for small molecules related to medicinal chemistry, mainly drugs (Gund et al., 1988). Related recommendations and guidelines have been published in bioinformatics e.g. for computer-aided variation interpretation (Vihinen, 2012, 2013) and for sequence alignments (Vihinen, 2020). Minimum Information About Bioinformatics Investigation (MIABI, https://fairsharing.org/FAIRsharing.28yec8) (Tan et al., 2010) provides basic reporting guidelines including the used algorithm, analysis protocol, used databases, resources, software and (web)services. However, more detailed data is needed for structural models.
The goal in here is to provide guidelines for reporting use and results from molecular modelling. The guidelines originate from frustration in reading published articles and manuscripts that do not contain sufficient details to allow reader to comprehend and evaluate modelling process and quality of the output and validity of made predictions e.g. in relation to function or diseases. These guidelines are simple to follow and apply. Modelling studies should be described with similar degree of detail as experimental articles. As many journals do not allow reporting full methodological details in the main article, provide details in supplementary material and other parts of the article. Many details can be included also to tables, figures and figure captions.