Comparing the results from bioinformatics predictions tools and
the experimental results
Here, we compared experimental results with bioinformatics predictions
of 40 recombinant proteins using previously published articles. The
sequence-based user-friendly predictor tools, including Protein-sol,
FoldIndex, Recombinant Protein Solubility Prediction and SOLpro were
used to predict protein solubility (Table 1). Furthermore, we measured
parameters such as molecular weight, pI, helix percentage, aliphatic
index and GRAVY. A new method, called the self-optimized prediction
multiple alignment (SOPMA), has been applied to predict the helix
percentage of recombinant proteins. Physicochemical parameters such as
molecular weight, pI, helix percentage, aliphatic index, and GRAVY were
computed using the ProtParam tool on the ExPASy server
(http://us.expasy.org/tools/protparam.html) (Table 2). The results
of 24 recombinant proteins predicted by FoldIndex are depicted in graph
form where the soluble expressed proteins in laboratory are highlighted
(Figure 1). Statistical analysis was performed using SPSS software. Data
analysis indicated that the solubility of recombinant proteins by
prediction tools RPSP and SOLpro show higher sensitivity and specificity
(RPSP: sensitivity 43.5% and specificity 52.9%; SOLpro: sensitivity
56.5% and specificity 47.1%) than FoldIndex and PSoL, while in
comparison with experimental results, the kappa value were -0.34 and
0.36, respectively.
Moreover, we examined the effect of MW, pI, helix percentage, GRAVY,
aliphatic index, FoldIndex and PSoL on solubility of recombinant
proteins by roc curve and average with experimental results as gold
standard (p-value< 0.05) and determined certain considerations
for gene design of recombinant soluble proteins. Although, one report
indicated that the helix structure reduce the solubility of the
expressed protein in E. coli (Bhandari, Gardner, & Lim, 2020),
several reports demonstrate the positive effect of high helix structure
percentage in protein solubility (Dai et al., 2014; Smialowski et al.,
2012). In addition, charge composition and the number of Lysine,
Leucine, Isoleucine, Asparagine, Glutamine and Threonine residues are
beneficial for improving soluble protein expression (Dai et al., 2014).
In the present review, we described some critical points in gene design,
choice of vector and host, cell culture condition and challenges worthy
of consideration for soluble expression of recombinant proteins inE. coli . Examination of the accuracy of prediction tools by
comparison with experimental results revealed higher sensitivity and
specificity of RPSP and SOLpro versus FoldIndex and PSoL. However, the
coordination between experimental and prediction tools were negligible.
Some parameters such as helix structure, molecular weight and aliphatic
index had a significant effect on protein solubility (p-value
< 0.05).