Comparing the results from bioinformatics predictions tools and the experimental results
Here, we compared experimental results with bioinformatics predictions of 40 recombinant proteins using previously published articles. The sequence-based user-friendly predictor tools, including Protein-sol, FoldIndex, Recombinant Protein Solubility Prediction and SOLpro were used to predict protein solubility (Table 1). Furthermore, we measured parameters such as molecular weight, pI, helix percentage, aliphatic index and GRAVY. A new method, called the self-optimized prediction multiple alignment (SOPMA), has been applied to predict the helix percentage of recombinant proteins. Physicochemical parameters such as molecular weight, pI, helix percentage, aliphatic index, and GRAVY were computed using the ProtParam tool on the ExPASy server (http://us.expasy.org/tools/protparam.html) (Table 2). The results of 24 recombinant proteins predicted by FoldIndex are depicted in graph form where the soluble expressed proteins in laboratory are highlighted (Figure 1). Statistical analysis was performed using SPSS software. Data analysis indicated that the solubility of recombinant proteins by prediction tools RPSP and SOLpro show higher sensitivity and specificity (RPSP: sensitivity 43.5% and specificity 52.9%; SOLpro: sensitivity 56.5% and specificity 47.1%) than FoldIndex and PSoL, while in comparison with experimental results, the kappa value were -0.34 and 0.36, respectively.
Moreover, we examined the effect of MW, pI, helix percentage, GRAVY, aliphatic index, FoldIndex and PSoL on solubility of recombinant proteins by roc curve and average with experimental results as gold standard (p-value< 0.05) and determined certain considerations for gene design of recombinant soluble proteins. Although, one report indicated that the helix structure reduce the solubility of the expressed protein in E. coli (Bhandari, Gardner, & Lim, 2020), several reports demonstrate the positive effect of high helix structure percentage in protein solubility (Dai et al., 2014; Smialowski et al., 2012). In addition, charge composition and the number of Lysine, Leucine, Isoleucine, Asparagine, Glutamine and Threonine residues are beneficial for improving soluble protein expression (Dai et al., 2014).
In the present review, we described some critical points in gene design, choice of vector and host, cell culture condition and challenges worthy of consideration for soluble expression of recombinant proteins inE. coli . Examination of the accuracy of prediction tools by comparison with experimental results revealed higher sensitivity and specificity of RPSP and SOLpro versus FoldIndex and PSoL. However, the coordination between experimental and prediction tools were negligible. Some parameters such as helix structure, molecular weight and aliphatic index had a significant effect on protein solubility (p-value < 0.05).