1 Introduction
In microbial bioprocesses, yeast extract is commonly used as source of nitrogen, vitamins, and minerals. Yeast extract is a complex raw material usually produced from baker’s or brewer’s yeast through autolysis or chemical digestion.[1,2] It is also used as a supplemental material in serum-free media for mammalian cell culture and human immunoglobulin production.[3,4]The composition varies among lots and brands because of its complex substrates, uncontrolled fermentation conditions during yeast cultivation, and variation of downstream processes during manufacturing.[5] This variation results in compositional differences and often causes inconsistent fermentation performances in microbial processes. If this occurs, laboratory testing or screening of many yeast extracts is performed to determine the most promising extract suitable for large-scale use.
Recombinant protein expression in Escherichia coli is an important technology used in heterologous protein production.[6] When producing recombinant proteins with the E. coli protein expression system, yeast extract is often added to increase enzymatic activity and protein production.[7,8,9] In some cases, other raw materials, such as sugar cane molasses and corn steep liquor, have been used in addition to yeast extract to increase heterologous protein production.[10] The experimental design of a protein expression experiment can optimize the medium composition.[10] However, the variation in raw material composition is often ignored when optimizing medium components in the laboratory. Porvin et al. developed an automated tubidimetric system to screen yeast extract for growth of Lactobacillus plantarum .[11] Near-infrared (NIR) spectroscopy has been applied to investigate the effects of yeast extract composition on recombinant protein production.[12] In the case of mammalian cell cultivation, a combination of spectroscopy and chemometrics has been used for the characterization of raw materials in media.[13] NIR is useful for real-time monitoring and quality checking of microbial cultivation.[14]However, this method no longer provides feedback information for optimizing the media. In previous studies, we successfully used metabolomics-based approaches with non-targeted analyses via gas chromatography-mass spectrometry (GC-MS) and machine learning to estimate the effect of yeast extract on microbial growth.[15,16,17] We demonstrated that 165 peaks were observed using GC-MS when E. coli was cultivated in 24 different medium compositions with 6 different yeast extracts. The data fit well to the partial least squares regression (PLS) model with reasonable accuracies. Because they are important medium components, the PLS model estimated several amino acids, and some of these amino acids were found to influence E. coli growth in validation experiments.[15] This approach was also applied to bioethanol production. In the model fitting of PLS and DNN,[16,17] the volatile components of hydrolysates derived from lignocellulosic biomass served as independent variables and ethanol and cell yields served as dependent variables. However, this metabolomics approach has never been applied to heterologous protein production by E. coli mutants.
In general, PLS and its modified methods, such as orthogonal projections to latent structures and soft independent modelling of class analogy, are used in metabolomics studies.[18] DNN is a powerful tool for analyzing datasets derived from biological systems. However, it appears to be inapplicable to metabolomics studies because it is difficult to identify the contributing factors. Date and Kikuchi reported the use of DNN with a mean decreased accuracy based on a permutation algorithm that achieved higher classification accuracy than random forest regression (RF) and PLS and identified important variables.[19]
In this study, we applied a DNN-mediated metabolomics approach to improve estimation of the effects of raw materials during microbial cultivation on foreign protein production by E. coli using heterologous GFP expression in E. coli with different yeast extract compositions. The PLS, RF, neural networks (NN), and DNN models were compared based on the degree of model fitting, and significant variations were estimated by a mean increase errors (MIE) calculation based on a permutation algorithm.