(b) Separation section.
Figure 1. Process flow diagram of ethylene oxide production.
Design of experiments (DoE)
To reduce the number of simulations or experiments, a DoE was developed. It is a method for determining which candidates are to be experimented first among experimental candidates. Here, we adopt the D-optimal design13 in the DoE, which can select several informative X values from a large number of candidates. Therefore, it is possible to select the X candidates that are likely to construct a good regression model with only the dataset of X.
Adaptive DoE
An adaptive DoE that decides on the experimental candidates sequentially, on the basis of the experimental results, was developed. Response surface methodology15 and Bayesian optimization12 are often used in adaptive DoE. Here, we use Bayesian optimization that can estimate regions with a small number of samples or extrapolation regions and can estimate from a small number of samples. Bayesian optimization has been used to search for optimal experimental conditions16, 17, 18, 19 and process conditions20, 21, 22, and to optimize hyperparameters23. Here, we apply Bayesian optimization to process design.
Gaussian process regression (GPR)
Bayesian optimization uses Gaussian process regression (GPR)24. When the input x is given, the output y(x ) is represented as a probability model that follows a normal distribution. Assuming that the GPR model is linear, the i-th samples are given by:
\(y^{(i)}=\mathbf{x}^{(i)}\mathbf{b}\) (4)
where b is a vector of regression coefficients. The prior distribution of b assumes a normal distribution with a zero mean and variance σ b2. Then, the mean vector mi of they (i ) and the covarianceσ yi ,j 2 of they (i ) andy (j ) are calculated from:
\(m_{i}=E\left[y^{(i)}\right]=0\) (4)
\({\sigma_{yi,j}}^{2}=cov\left[y^{\left(i\right)},y^{\left(j\right)}\right]={\mathbf{x}^{(i)}\mathbf{x}^{(j)}}^{T}{\sigma_{b}}^{2}\)(5)
The input x is transformed by the nonlinear function φ , and σ yi ,j 2 is calculated from:
\({\sigma_{yi,j}}^{2}=\varphi\left(\mathbf{x}^{(i)}\right)\varphi\left(\mathbf{x}^{(j)}\right)^{T}{\sigma_{b}}^{2}\)(6)
Actual y have a measurement error. The i-th samples including the measurement error isy obs(i ) and its measurement error is e (i) . Thus,y obs(i ) is given by:
\({y_{\text{obs}}}^{(i)}=y^{(i)}+e^{(i)}\) (7)
e (i ) assumes a normal distribution with a zero mean and variance σ e2, and e (i ) is independent for each sample. Then, the covarianceσ yobs,i ,j 2between y obs(i ) andy obs(j ) is calculated as follows:
\({\sigma_{\text{yobs\ }i,j}}^{2}=\varphi\left(\mathbf{x}^{(i)}\right)\varphi\left(\mathbf{x}^{(j)}\right)^{T}{\sigma_{b}}^{2}+\delta_{i,j}{\sigma_{e}}^{2}=K\left(\mathbf{x}^{(i)},\mathbf{x}^{(j)}\right)\)(8)
where, K is a kernel function.
In the GPR method, if the output\(\mathbf{y}_{\text{obs}}=\left({y_{\text{obs}}}^{(1)}\cdots{y_{\text{obs}}}^{(n)}\right)^{T}\)corresponding to the past input vector\(\mathbf{x}^{(1)}\cdots\mathbf{x}^{\left(n\right)}\) is used as training data, the output for the new input vector\({\ \mathbf{x}}^{\left(n+1\right)}\) can be estimated as a normal distribution with mean\(m\left(\mathbf{x}^{\left(n+1\right)}\right)\) and variance\(\sigma^{2}\left(\mathbf{x}^{\left(n+1\right)}\right)\), and is estimated by:
\(m\left(\mathbf{x}^{(n+1)}\right)=\mathbf{k}\sum_{n}^{-1}\mathbf{y}_{\text{obs}}\)(9)
\(\sigma^{2}\left(\mathbf{x}^{(n+1)}\right)=K\left(\mathbf{x}^{(n+1)},\mathbf{x}^{(n+1)}\right)-\mathbf{k}\sum_{n}^{-1}\mathbf{k}^{T}\)(10)
subject to:
\(\mathbf{k}=\left[K\left(\mathbf{x}^{(1)},\mathbf{x}^{(n+1)}\right)\ \cdots\ K\left(\mathbf{x}^{\left(i\right)},\mathbf{x}^{\left(n+1\right)}\right)\ \cdots K\left(\mathbf{x}^{(n)},\mathbf{x}^{(n+1)}\right)\right]\)(11)
Probability in target range
For the evaluation of each candidate in Bayesian optimization, the probability P that each y is within the range\(\left(y_{\min}\leq y\leq y_{\max}\right)\) is calculated25. It assumed that outputy (i ) is a normal distribution with mean\(m\left({\mathbf{x}_{\text{new}}}^{(i)}\right)\) and variance\(\sigma^{2}\left({\mathbf{x}_{\text{new}}}^{(i)}\right)\) when a new sample x new(i ) is input into the GPR model. P is thus calculated by integrating the area of the normal distribution, as given by:
\(P\left({\mathbf{x}_{\text{new}}}^{(i)}\right)=\ \int_{y\_min}^{y\_max}{\frac{1}{\sqrt{2\pi\sigma^{2}\left({\mathbf{x}_{\text{new}}}^{(i)}\right)}}\exp\left\{-\frac{1}{2\sigma^{2}\left({\mathbf{x}_{\text{new}}}^{(i)}\right)}\left(y-m\left({\mathbf{x}_{\text{new}}}^{(i)}\right)\right)^{2}\right\}\text{dy}}\)(12)
Figure 2 shows the basic concept of P . When the predicted values are close to the target range of y, the probability of candidates likec 1, which has m (x ) that are close to the target range of y, is large; and when the predicted values are far from the target range of y, the probability of candidates likec 2 with high\(\sigma^{2}\left(\mathbf{x}\right)\) is large.