PROTEINS: Structure, Function, and Bioinformatics - Authorea

by author

by title

by keyword

Helical Twists and ß-Turns in Structures at Serine–Proline Sequences: Stabilization o...

Neal J. Zondlo

May 26, 2024

Structures at serine-proline sites in proteins were analyzed using a combination of peptide synthesis with structural methods and bioinformatics analysis of the PDB. Dipeptides were synthesized with the proline derivative (2 S,4 S)-(4-iodophenyl)hydroxyproline [hyp(4-I-Ph)]. The crystal structure of Boc-Ser-hyp(4-I-Ph)-OMe had two molecules in the unit cell. One molecule exhibited cis-proline and a type VIa2 β-turn (BcisD). The cis-proline conformation was stabilized by a C–H/O interaction between Pro C–H α and the Ser side-chain oxygen. NMR data were consistent with stabilization of cis-proline by a C–H/O interaction in solution. The other crystallographically observed molecule had trans-Pro and both residues in the PPII conformation. Two conformations were observed in the crystal structure of Ac-Ser-hyp(4-I-Ph)-OMe, with Ser adopting PPII in one and the β conformation in the other, each with Pro in the δ conformation and trans-Pro. Structures at Ser-Pro sequences were further examined via bioinformatics analysis of the PDB and via DFT calculations. Ser–Pro versus Ala-Pro sequences were compared to identify bases for Ser stabilization of local structures. C–H/O interactions between the Ser side-chain O γ and Pro C–H α were observed in 45% of structures with Ser- cis-Pro in the PDB, with nearly all Ser- cis-Pro structures adopting a type VI β-turn. 53% of Ser- trans-Pro sequences exhibited main-chain C=O i•••H–N i +3 or C=O i•••H–N i +4 hydrogen bonds, with Ser as the i residue and Pro as the i+1 residue. These structures were overwhelmingly either type I β-turns or N-terminal capping motifs on α-helices or a 3 10-helices. These results indicate that Ser-Pro sequences are particularly potent in favoring these structures. In each, Ser is in either the PPII or β conformation, with the Ser O γ capable of engaging in a hydrogen bond with the amide N–H of the i+2 (type I β-turn or 3 10-helix; Ser χ 1 t) or i+3 (α-helix; Ser χ 1 g+) residue. Non-proline cis amide bonds can also be stabilized by C–H/O interactions.

Dynamically driven correlations in elastic net models reveal sequence of events and c...

Burak Erman

and 1 more

April 20, 2024

Protein dynamics orchestrate allosteric regulation, but elucidating the sequence of events and causal relationships within these intricate processes remains challenging. We introduce the Dynamically Perturbed Gaussian Network Model (DP-GNM), a novel approach that uncovers the directionality of information flow within proteins. DP-GNM leverages time-dependent correlations to achieve two goals: identifying driver and driven residues and revealing communities of residues exhibiting synchronized dynamics. Applied to wild type and mutated structures of Cyclophilin A, DP-GNM unveils a hierarchical network of information flow, where key residues initiate conformational changes that propagate through the protein in a directed manner. This directional causality illuminates the intricate relationship between protein dynamics and allosteric regulation, providing valuable insights into protein function and potential avenues for drug design. Furthermore, DP-GNM’s potential to elucidate dynamics under periodic perturbations like the circadian rhythm suggests its broad applicability in understanding complex biological processes governed by environmental cycles.

Allosteric Modulation of Fluorescence Revealed by Hydrogen Bond Dynamics in a Genetic...

Canan Atilgan

and 1 more

April 08, 2024

Genetically encoded fluorescent biosensors (GEFBs) proved to be reliable tracers for many metabolites and cellular processes. In the simplest case, a fluorescent protein (FP) is genetically fused to a sensing protein which undergoes a conformational change upon ligand binding. This drives a rearrangement in the chromophore environment and changes the spectral properties of the FP. Structural determinants of successful biosensors are revealed only in hindsight when the crystal structures of both ligand-bound and ligand-free forms are available. This makes the development of new biosensors for desired analytes a long trial-and-error process. In the current study, we conducted µs-long all atom molecular dynamics (MD) simulations of a maltose biosensor in both the apo (dark) and holo (bright) forms. We performed detailed hydrogen bond occupancy analyses to shed light on the mechanism of ligand induced conformational change in the sensor protein and its allosteric effect on the chromophore environment. We find that two strong indicators for distinguishing bright and dark states of biosensors are due to substantial changes in hydrogen bond dynamics in the system and solvent accessibility of the chromophore.

Pyroglutamylation Modulates Electronic Properties and the Conformational Ensemble of...

Justin Lemkul

and 1 more

November 20, 2023

Alzheimer’s disease (AD) is a neurodegenerative disorder that is characterized by the formation of extracellular amyloid- β (A β) plaques. The underlying cause of AD is unknown, however, post-translational modifications (PTMs) of A β have been found in AD patients and are thought to play a role in protein aggregation. One such PTM is pyroglutamylation, which can occur at two sites in A β, Glu3 and Glu11. This modification of A β involves the truncation and charge-neutralization of N-terminal glutamate, causing A β to become more hy- drophobic and prone to aggregation. The molecular mech- anism by which the introduction of pyroglutamate (pE) pro- motes aggregation has not been determined. To gain a greater understanding of the role that charge neutralization and trun- cation of the N-terminus plays on A β conformational sam- pling, we used the Drude polarizable force field (FF) to per- form molecular dynamics simulations on A β pE3-42 and A β pE11-42 and comparing their properties to previous simulations of A β 1-42. The Drude polarizable FF allows for a more accurate representation of electrostatic interactions, therefore pro- viding novel insights into the role that charge plays in pro- tein dynamics. Here, we report the parametrization of pE in the Drude polarizable FF and the effect of pyroglutamyla- tion on A β. We found that A β pE3-42 and A β pE11-42 alter the permanent and induced dipoles of the peptide. Specifically, we found that A β pE3-42 and A β pE11-42 have modification- specific backbone and sidechain polarization response and perturbed solvation properties that shift the A β conforma- tional ensemble.

Critical Assessment of Methods of Protein Structure Prediction (CASP) – Round XV

Andriy Kryshtafovych

and 4 more

October 06, 2023

Computing protein structure from amino acid sequence information has been a long-standing grand challenge. CASP (Critical Assessment of Structure Prediction) conducts community experiments aimed at advancing solutions to this and related problems. Experiments are conducted every two years. The 2020 experiment (CASP14) saw major progress, with the second generation of deep learning methods delivering accuracy comparable with experiment for many single proteins. There is an expectation that these methods will have much wider application in computational structural biology. Here we summarize results from the most recent experiment, CASP15, in 2022, with an emphasis on new deep learning-driven progress. Other papers in this special issue of Proteins provide more detailed analysis. For single protein structures, the AlphaFold2 deep learning method is still superior to other approaches, but there are two points of note. First, although AlphaFold2 was the core of all the most successful methods, there was a wide variety of implementation and combination with other methods. Second, using the standard AlphaFold2 protocol and default parameters only produces the highest quality result for about two thirds of the targets, and more extensive sampling is required for the others. The major advance in this CASP is the enormous increase in the accuracy of computed protein complexes, achieved by the use of deep learning methods, although overall these do not fully match the performance for single proteins. Here too, AlphaFold2 based method perform best, and again more extensive sampling than the defaults is often required. Also of note are the encouraging early results on the use of deep learning to compute ensembles of macromolecular structures. Critically for the usability of computed structures, for both single proteins and protein complexes, deep learning derived estimates of both local and global accuracy are of high quality, however the estimates in interface regions are slightly less reliable. CASP15 also included computation of RNA structures for the first time. Here, the classical approaches produced better agreement with experiment than the new deep learning ones, and accuracy is limited. Also, for the first time, CASP included the computation of protein-ligand complexes, an area of special interest for drug design. Here too, classical methods were still superior to deep learning ones. Many new approaches were discussed at the CASP conference, and it is clear methods will continue to advance.

Exploiting Protein Language Models for the Precise Classification of Ion Channels and...

Hamed Ghazikhani

and 1 more

September 01, 2023

This study presents TooT-PLM-ionCT, a holistic framework that exploits the capabilities of six diverse Protein Language Models (PLMs) - ProtBERT, ProtBERT-BFD, ESM-1b, ESM-2 (650M parameters), and ESM-2 (15B parameters) - for precise classification of integral membrane proteins, specifically ion channels (ICs) and ion transporters (ITs). As these proteins play a pivotal role in the regulation of ion movement across cellular membranes, they are integral to numerous biological processes and overall cellular vitality. To circumvent the costly and time-consuming nature of wet lab experiments, we harness the predictive prowess of PLMs, drawing parallels with techniques in natural language processing. Our strategy engages six classifiers, embracing both conventional methodologies and a deep learning model, to segregate ICs and ITs from other membrane proteins, as well as differentiate ICs from ITs. Furthermore, we delve into critical factors influencing our tasks, including the implications of dataset balancing, the effect of frozen versus fine-tuned PLM representations, and the potential variance between half and full precision floating-point computations. Our empirical results showcase superior performance in distinguishing ITs from other membrane proteins and differentiating ICs from ITs, while the task of discriminating ICs from other membrane proteins exhibits results commensurate with the current state-of-the-art.

Breaking the conformational ensemble barrier: Ensemble structure modeling challenges...

Andriy Kryshtafovych

and 5 more

August 14, 2023

For the first time, the 2022 CASP (Critical Assessment of Structure Prediction) community experiment included a section on computing multiple conformations for protein and RNA structures. There was full or partial success in reproducing the ensembles for four of the nine targets, an encouraging result. For protein structures, enhanced sampling with variations of the AlphaFold2 deep learning method was by far the most effective approach. One substantial conformational change caused by a single mutation across a complex interface was accurately reproduced. In two other assembly modeling cases, methods succeeded in sampling conformations near to the experimental ones even though environmental factors were not included in the calculations. An experimentally derived flexibility ensemble allowed a single accurate RNA structure model to be identified. Difficulties included how to handle sparse or low-resolution experimental data and the current lack of effective methods for modeling RNA/protein complexes. However, these and other obstacles appear addressable.

The N-terminal intrinsically disordered region of Ncb5or docks with the cytochrome b5...

Hao Zhu

and 8 more

July 22, 2023

Ncb5or (NADH cytochrome b5 oxidoreductase) is a cytosolic ferric reductase implicated in diabetes and neurological conditions. Ncb5or comprises cytochrome b5 (b5) and cytochrome b5 reductase (b5R) domains separated by a CHORD-Sgt1 (CS) linker domain. Ncb5or redox activity depends on proper interdomain interactions to mediate electron transfer from NADH or NADPH via FAD to heme. While full-length human Ncb5or has proven resistant to crystallization, we have succeeded in obtaining high-resolution atomic structures of the b5 domain and a construct containing the CS and b5R domains (CS/b5R). Ncb5or also contains an N-terminal intrinsically disordered region of 50 residues with a distinctive, conserved L 34MDWIRL 40 motif that has no homologs in animals but is present in root lateral formation protein (RLF) in rice and Increased Recombination Center 21 (IRC21) in baker’s yeast, and in these proteins, it is likewise attached to a b5 domain. After unsuccessful attempts at crystallizing a human Ncb5or construct comprising the N-terminal region naturally fused to the b5 domain, we were able to obtain a high-resolution atomic structure of a recombinant rice RLF construct corresponding to residues 25-129 of human Ncb5or (52% sequence identity; 74% similarity). The structure reveals Trp120 (corresponding to invariant Trp37 in Ncb5or) to be part of an 11-residue α-helix (S 116QMDWLKLTRT 126) packing against two of the four helices in the b5 domain that surround heme (α2 and α5). The Trp120 side chain forms a network of interactions with the side chains of four highly conserved residues corresponding to Tyr85 and Tyr88 (α2), Cys124 (α5), and Leu47 in Ncb5or. Circular dichroism (CD) measurements of human Ncb5or fragments further support a key role of Trp37 in nucleating the formation of the N-terminal helix, whose location in the N/b5 module suggests a role in regulating the function of this multidomain redox enzyme. This study revealed for the first time an ancient origin of a helical motif in the N/b5 module as reflected by its existence in a class of cytochrome b5 proteins from three kingdoms among eukaryotes.

Community analysis of large-scale molecular dynamics simulations elucidated dynamics-...

Metaxia Vlassi

and 1 more

July 18, 2023

TYK2 is a non-receptor tyrosine kinase, member of the Janus kinases (JAK), with a central role in several diseases, including cancer. The JAKs’ catalytic domains (KD) are highly conserved, yet the isolated TYK2-KD exhibits unique specificities. In a previous work, using molecular dynamics (MD) simulations of a catalytically-impaired TYK2-KD variant (P1104A) we found that this amino-acid change of its JAK-characteristic insert (αFG), acts at the dynamics level. Given that structural dynamics is key to allosteric activation of protein kinases, in this study we applied a long-scale MD simulation and investigated an active TYK2-KD form in the presence of adenosine 5’-triphosphate and one magnesium ion that represents a dynamic and crucial step of the catalytic cycle, in other protein kinases. Community analysis of the MD trajectory shed light, for the first time, on the dynamic profile and dynamics-driven allosteric communications within the TYK2-KD during activation and revealed that αFG and amino-acids P1104, P1105 and I1112 in particular, hold a pivotal role and act synergistically with a dynamically coupled communication network of amino-acids serving intra-KD signaling for allosteric regulation of TYK2 activity. Corroborating our findings, most of the identified amino-acids are associated with cancer-related missense/splice-site mutations of the Tyk2 gene. We propose that the conformational dynamics at this step of the catalytic cycle, coordinated by αFG, underlies TYK2-unique substrate recognition and accounts for its distinct specificity. In total, this work adds to knowledge towards an in-depth understanding of TYK2 activation and may be valuable towards a rational design of allosteric TYK2-specific inhibitors.

Impact of AlphaFold on Structure Prediction of Protein Complexes: The CASP15-CAPRI Ex...

Marc Lensink

and 112 more

July 09, 2023

We present the results for CAPRI Round 54, the 5th joint CASP-CAPRI protein assembly prediction challenge. The Round offered 37 targets, including 14 homo-dimers, 3 homo-trimers, 13 hetero-dimers including 3 antibody-antigen complexes, and 7 large assemblies. On average ~70 CASP and CAPRI predictor groups, including more than 20 automatics servers, submitted models for each target. A total of 21941 models submitted by these groups and by 15 CAPRI scorer groups were evaluated using the CAPRI model quality measures and the DockQ score consolidating these measures. The prediction performance was quantified by a weighted score based on the number of models of acceptable quality or higher submitted by each group among their 5 best models. Results show substantial progress achieved across a significant fraction of the 60+ participating groups. High-quality models were produced for about 40% for the targets compared to 8% two years earlier, a remarkable improvement resulting from the wide use of the AlphaFold2 and AlphaFold-Multimer software. Creative use was made of the deep learning inference engines affording the sampling of a much larger number of models and enriching the multiple sequence alignments with sequences from various sources. Wide use was also made of the AlphaFold confidence metrics to rank models, permitting top performing groups to exceed the results of the public AlphaFold-Multimer version used as a yard stick. This notwithstanding, performance remained poor for complexes with antibodies and nanobodies, where evolutionary relationships between the binding partners are lacking, and for complexes featuring conformational flexibility, clearly indicating that the prediction of protein complexes remains a challenging problem.

Challenges in Bridging the Gap Between Protein Structure Prediction and Functional In...

Mihaly Varadi

and 2 more

June 30, 2023

The rapid evolution of protein structure prediction tools has significantly broadened access to protein structural data. Although predicted structure models have the potential to accelerate and impact fundamental and translational research significantly, it is essential to note that they are not validated and cannot be considered the ground truth. Thus, challenges persist, particularly in capturing protein dynamics, predicting multi-chain structures, interpreting protein function, and assessing model quality. Interdisciplinary collaborations are crucial to overcoming these obstacles. Databases like the AlphaFold Protein Structure Database, the ESM Metagenomic Atlas, and initiatives like the 3D-Beacons Network provide FAIR access to these data, enabling their interpretation and application across a broader scientific community. Whilst substantial advancements have been made in protein structure prediction, further progress is required to address the remaining challenges. Developing training materials, nurturing collaborations, and ensuring open data sharing will be paramount in this pursuit. The continued evolution of these tools and methodologies will deepen our understanding of protein function and accelerate disease pathogenesis and drug development discoveries.

CASP15 cryoEM protein and RNA targets: refinement and analysis using experimental map...

Thomas Mulvaney

and 7 more

June 22, 2023

CASP assessments primarily rely on comparing predicted coordinates with experimental reference structures. However, errors in the reference structures can potentially reduce the accuracy of the assessment. This issue is particularly prominent in cryoEM-determined structures, and therefore, in the assessment of CASP15 cryoEM targets, we directly utilized density maps to evaluate the predictions. A method for ranking the quality of protein chain predictions based on rigid fitting to experimental density was found to correlate well with the CASP assessment scores. Overall, the evaluation against the density map indicated that the models are of high accuracy although local assessment of predicted side chains in a 1.52 Å resolution map showed that side-chains are sometimes poorly positioned. The top 136 predictions associated with 9 protein target reference structures were selected for refinement, in addition to the top 40 predictions for 11 RNA targets. To this end, we have developed an automated hierarchical refinement pipeline in cryoEM maps. For both proteins and RNA, the refinement of CASP15 predictions resulted in structures that are close to the reference target structure, including some regions with better fit to the density. This refinement was successful despite large conformational changes and secondary structure element movements often being required, suggesting that predictions from CASP-assessed methods could serve as a good starting point for building atomic models in cryoEM maps for both proteins and RNA. Loop modeling continued to pose a challenge for predictors with even short loops failing to be accurately modeled or refined at times. The lack of consensus amongst models suggests that modeling holds the potential for identifying more flexible regions within the structure.

Signatures of tRNA Glx -specificity in bacterial glutamyl-tRNA synthetases

Gautam Basu

and 3 more

June 21, 2023

The canonical function of glutamyl-tRNA synthetase (GluRS) is to glutamylate tRNA Glu. Yet, not all bacterial GluRSs glutamylate tRNA Glu; many glutamylate both tRNA Glu and tRNA Gln, while some glutamylate only tRNA Gln and not the cognate substrate tRNA Glu. Understanding the basis of this unique tRNA Glx-specificity is important. Mutational studies have hinted at hotspot residues, both on tRNA Glx and GluRS, that play crucial roles in tRNA Glx-specificity. But the underlying structural basis remains unexplored. Majority of biochemical studies related to tRNA Glx-specificity have been performed on GluRS from Escherichia coli and other proteobacterial species. However, since the early crystal structures of GluRS and tRNA Glu-bound GluRS were from non-proteobacterial species ( Thermus thermophilus), the proteobacterial biochemical data have often been interpreted in the context of non-proteobacterial GluRS structures. Marked differences between proteo- and non-proteobacterial GluRSs have been demonstrated and therefore it is important that tRNA Glx-specificity be understood vis-a-vis proteobacterial GluRS structures. Towards this goal we have solved the crystal structure of GluRS from E. coli. Using the solved structure and several other currently available proteo- and non-proteobacterial GluRS crystal structures, we have probed the structural basis of tRNA Glx-specificity of bacterial GluRSs. Specifically, our analysis suggests a unique role played by a tRNA Glx D-helix contacting loop of GluRS in modulation of tRNA Gln-specificity. While earlier studies had identified functional hotspots on tRNA Glx that controlled tRNA Glx-specificity of GluRS, this is the first report of complementary signatures of tRNA Glx-specificity in GluRS.

Comparative analysis of permanent and transient domain-domain interactions in multi-d...

Ramanathan Sowdhamini

and 2 more

June 01, 2023

Protein domains are structural, functional, and evolutionary units. These domains bring out the diversity of functionality by means of interactions with other co-existing domains and provide stability. Hence, it is important to study intra-protein inter-domain interactions from the perspective of types of interactions. Domains within a chain could interact over short timeframes or permanently, rather like protein-protein interactions (PPIs). However, no systematic study has been carried out between two classes, namely permanent and transient domain-domain interactions (DDIs). In this work, we studied 264 two-domain proteins, belonging to either of these classes and their interfaces on the basis of several factors, such as interface area and details of interactions (number, strengths, and types of interactions). We also characterized them based on residue conservation at the interface, correlation of residue motions across domains, its involvement in repeat formation, and their involvement in particular molecular processes. Finally, we could analyse the interactions arising from domains in two-domain monomeric proteins, and we observed significant differences between these two classes of domain interactions and a few similarities. This study will help to obtain a better understanding of structure-function and folding principles of multi-domain proteins.

The alteration of structural network upon transient association between proteins stud...

Ramanathan Sowdhamini

and 4 more

May 31, 2023

Proteins such as enzymes perform their function by predominant non-covalent bond interactions between transiently interacting units. There is an impact on the overall structural topology of the protein, albeit transient nature of such interactions, that enable proteins to deactivate or activate. This aspect of the alteration of the structural topology is studied by employing protein structural networks, which are node-edge representative models of protein structure, reported as a robust tool for capturing interactions between residues. Several methods have been optimised to collect meaningful, functionally relevant information by studying alteration of structural networks. In this article, different methods of comparing protein structural networks are employed, along with spectral decomposition of graphs to study the subtle impact of protein-protein interactions. A detailed analysis of the structural network of interacting partners is performed across a dataset of around 900 pairs of bound complexes and corresponding unbound protein structures. The variation in network parameters at, around and far away from the interface are analysed. Finally, we present interesting case studies, where an allosteric mechanism of structural impact is understood from communication-path detection methods. The results of this analysis are beneficial in understanding protein stability, for future engineering and docking studies.

A spectrophotometric trimethylamine monooxygenase assay

Gurunath Ramanathan

and 2 more

May 31, 2023

Trimethylamine monooxygenase ( Tmm, EC-1.14.13.148) belongs to the family of flavin-containing monooxygenases (FMOs) that oxidize trimethylamine into trimethylamine-N-oxide (TMAO). Conventional methods for assaying Tmm are accurate over a narrow range of substrate/ product concentrations. Here we report a TMAO-specific enzymatic assay for Tmm using polyallylamine hydrochloride (PAHCl)-capped MnO 2 nanoparticles (PAHCl@MnO 2). We achieved TMAO specificity using iodoacetonitrile to remove interfering trimethylamine. The change in the concentration of TMAO is measured by observing the difference in the absorbance of 3,3´,5,5´-tetramethylbenzidine (TMB) at 652 nm. The assay is tolerant to several interfering metal ions and other compounds. This method is more reliable and easier than currently known methods. The limit of detection (LOD) and limit of quantitation (LOQ) are 1 µM and 10 µM, respectively, for direct TMAO measurement.

Tertiary structure assessment at CASP15

Daniel Rigden

and 7 more

May 24, 2023

The results of tertiary structure assessment at CASP15 are reported. For the first time, recognising the outstanding performance of AlphaFold 2 (AF2) at CASP14, all single chain predictions were assessed together, irrespective of whether a template was available. At CASP15 there was no single stand-out group, with most of the best-scoring groups - led by PEZYFoldings, UM-TBM and Yang Server - employing AF2 in one way or another. Many top groups paid special attention to generating deep Multiple Sequence Alignments (MSAs) and testing variant MSAs, thereby allowing them to successfully address some of the hardest targets. Such difficult targets, as well as lacking templates, were typically proteins with few homologues: small size, high α-helical content and monomeric structure were other likely aggravating factors. Local divergence between prediction and target correlated with localisation at crystal lattice or chain interfaces, and with regions exhibiting high B-factor factors in crystal structure targets, but should not necessarily be considered as representing error in the prediction. However, analysis of exposed and buried side chain accuracy showed room for improvement even in the latter. Nevertheless, a majority of groups, including those applying methods similar to those used to generate major resources such as the AlphaFold Protein Structure Database and the ESM Metagenomic atlas, produced high quality predictions for most targets which are valuable for experimental structure determination, functional analysis and many other tasks across biology.

The energetics and evolution of oxidoreductases in deep time

Vikas Nanda

and 16 more

May 18, 2023

The core metabolic reactions of life drive electrons through a class of redox protein enzymes, the oxidoreductases. The energetics of electron flow is determined by the redox potentials of organic and inorganic cofactors as tuned by the protein environment. Understanding how protein structure affects oxidation-reduction energetics is crucial for studying metabolism, creating bioelectronic systems, and tracing the history of biological energy utilization on Earth. We constructed ProtReDox ([https://protein-redox-potential.web.app](https://protein-redox-potential.web.app)), a manually curated database of experimentally determined redox potentials. With over 500 measurements, we can begin to identify how proteins modulate oxidation-reduction energetics across the tree of life. By mapping redox potentials onto networks of oxidoreductase fold evolution, we can infer the evolution of electron transfer energetics over deep-time. ProtReDox is designed to include user-contributed submissions with the intention of making it a valuable resource for researchers in this field.