1 Introduction
In microbial bioprocesses, yeast extract is commonly used as source of
nitrogen, vitamins, and minerals. Yeast extract is a complex raw
material usually produced from baker’s or brewer’s yeast through
autolysis or chemical digestion.[1,2] It is also
used as a supplemental material in serum-free media for mammalian cell
culture and human immunoglobulin production.[3,4]The composition varies among lots and brands because of its complex
substrates, uncontrolled fermentation conditions during yeast
cultivation, and variation of downstream processes during
manufacturing.[5] This variation results in
compositional differences and often causes inconsistent fermentation
performances in microbial processes. If this occurs, laboratory testing
or screening of many yeast extracts is performed to determine the most
promising extract suitable for large-scale use.
Recombinant protein expression in Escherichia coli is an
important technology used in heterologous protein
production.[6] When producing recombinant proteins
with the E. coli protein expression system, yeast extract is
often added to increase enzymatic activity and protein
production.[7,8,9] In some cases, other raw
materials, such as sugar cane molasses and corn steep liquor, have been
used in addition to yeast extract to increase heterologous protein
production.[10] The experimental design of a
protein expression experiment can optimize the medium
composition.[10] However, the variation in raw
material composition is often ignored when optimizing medium components
in the laboratory. Porvin et al. developed an automated tubidimetric
system to screen yeast extract for growth of Lactobacillus
plantarum .[11] Near-infrared (NIR) spectroscopy
has been applied to investigate the effects of yeast extract composition
on recombinant protein production.[12] In the case
of mammalian cell cultivation, a combination of spectroscopy and
chemometrics has been used for the characterization of raw materials in
media.[13] NIR is useful for real-time monitoring
and quality checking of microbial cultivation.[14]However, this method no longer provides feedback information for
optimizing the media. In previous studies, we successfully used
metabolomics-based approaches with non-targeted analyses via gas
chromatography-mass spectrometry (GC-MS) and machine learning to
estimate the effect of yeast extract on microbial
growth.[15,16,17] We demonstrated that 165 peaks
were observed using GC-MS when E. coli was cultivated in 24
different medium compositions with 6 different yeast extracts. The data
fit well to the partial least squares regression (PLS) model with
reasonable accuracies. Because they are important medium components, the
PLS model estimated several amino acids, and some of these amino acids
were found to influence E. coli growth in validation
experiments.[15] This approach was also applied to
bioethanol production. In the model fitting of PLS and
DNN,[16,17] the volatile components of
hydrolysates derived from lignocellulosic biomass served as independent
variables and ethanol and cell yields served as dependent variables.
However, this metabolomics approach has never been applied to
heterologous protein production by E. coli mutants.
In general, PLS and its modified methods, such as orthogonal projections
to latent structures and soft independent modelling of class analogy,
are used in metabolomics studies.[18] DNN is a
powerful tool for analyzing datasets derived from biological systems.
However, it appears to be inapplicable to metabolomics studies because
it is difficult to identify the contributing factors. Date and Kikuchi
reported the use of DNN with a mean decreased accuracy based on a
permutation algorithm that achieved higher classification accuracy than
random forest regression (RF) and PLS and identified important
variables.[19]
In this study, we applied a DNN-mediated metabolomics approach to
improve estimation of the effects of raw materials during microbial
cultivation on foreign protein production by E. coli using
heterologous GFP expression in E. coli with different yeast
extract compositions. The PLS, RF, neural networks (NN), and DNN models
were compared based on the degree of model fitting, and significant
variations were estimated by a mean increase errors (MIE) calculation
based on a permutation algorithm.