Introduction
Biologically-derived drugs have comprised a notable sector in the pharmaceutical industry in the past 20 years. Prokaryotic systems are incapable of effectively expressing glycosylated biologically-derived drugs. Nevertheless, 90% of pharmaceutical proteins are typically terminated at the initial steps of clinical development because of their low solubility (Dai et al., 2014). In many cases, solubilization of proteins in inclusion bodies is considered undesirable to obtain active recombinant protein conformation. The solubility of a recombinant protein can indicate the quality of its function. Generally, 30% of recombinant proteins are expressed in aggregate or insoluble form (Malaei, Rasaee, Latifi, & Rahbarizadeh, 2019; Sørensen & Mortensen, 2005). The production of soluble, pure and functional proteins is a high demand in biotechnology of vaccine development or biologically-derived drugs. Low natural protein sources, complex purification steps and high price are the factors favoring the application of recombinant cells as suitable tools for protein production. Due to its short lifetime, high-density culture, well-known genetics and cost effectiveness, the Gram-negative Escherichia coli (E. coli ), is an attractive host for the expression of recombinant proteins. In spite of all these qualities, expression of recombinant proteins in E. coli mostly yields insoluble or inclusion body forms (Esmaili, Sadeghi, & Akbari, 2018; Fakruddin, Mohammad Mazumdar, Bin Mannan, Chowdhury, & Hossain, 2012; Singhvi, Saneja, Srichandan, & Panda, 2020; Terol, Gallego-Jara, Martínez, Díaz, & de Diego Puente, 2019). Although, forming inclusion body can simplify protein purification steps and increase recombinant protein yield, a series of onerous tasks are involved in the protein refolding process (Hamidi, Safdari, & Arabi, 2019; He & Ohnishi, 2017; Leong, Chua, Samah, & Chew, 2019), and the majority of refolded proteins lack any biological activity, while soluble protein with proper folding is necessary for the structural and functional studies of a protein (Rosano, Morales, & Ceccarelli, 2019). Hence, bioinformatics tools can be considered a useful approach to predict the solubility of overexpressed proteins in E. coli .
To our knowledge, this is the first report comparing bioinformatics prediction and experimental results in overexpression of soluble recombinant proteins in E. coli (Habibi, Hashim, Norouzi, & Samian, 2014). Here, the advised strategies were categorized into the following three sections for consideration to improve soluble expression of a protein of interest: (1) gene design and bioinformatics prediction tools; (2) selection of vector and host strain; and (3) cell culture condition.