[insert Table 4 here]
Step 3: Model
identification
The third step is to identify the models for the average sensorial
rating on perfume smell (\(q_{s}\)) and the four target properties. The
models are elaborated below.
ANN-based surrogate model for sensorial
rating
A surrogate model is developed for predicting \(q_{s}\). Perfume
sensorial data are generated by matching the general consumers’
preferences reflected in various perfume review websites. Here, the data
is used to represent consumers’ satisfaction. A total of 761 data
samples are uploaded in
https://github.com/zx2012flying/Perfume-Case-Study. These data
samples only involve the 48 ingredient candidates in Table 4. For each
data sample, the input data includes the selected ingredients and their
volume fractions. The output data is the overall sensorial rating. For
consistency, the ratings are scaled to [0, 100] with 100 denoting
the best smell. The minimum and maximum ratings for these samples are
50.2 and 89.7, respectively. Based on these data, several surrogate
models such as linear regression, artificial neuron network (ANN), and
support vector regression are built using the Surrogate Modeling
Toolbox, Pyrenn, and Scikit-learn packages in Python 3.6. The
hyperparameters are tuned manually and the model accuracy is evaluated
through 10-fold cross validation. A three-layer ANN model (i.e., one
input layer, one hidden layer, and one output layer) was found to offer
the highest accuracy. Figure S1 shows the schematic structure of the ANN
model. The tansig and purelin functions are applied in the hidden and
output layer, respectively. The number of neurons in the hidden layer is
tuned to be 8. Figure 4 presents the histogram of the absolute errors
between the true values and predicted values
(\(q_{s}^{\text{true}}-q_{s}^{\text{pre}}\)). 90% of the deviations
are less than 10. The mean average error (MAE) and mean average
percentage error (MAPE) are equal to 4.8 and 6.9%, respectively. This
ANN model provides an accurate prediction of \(q_{s}\), which is
explicitly expressed as
\(q_{s}=\sum_{l=1}^{8}{wo_{l}\bullet\ \ f_{h}(ah_{l})}+bo\) (21)
\(f_{h}\left(ah_{l}\right)=1-\frac{2}{1+e^{2\times ah_{l}}},\ \ \ l=1,\ldots,8\)(22)
\(ah_{l}=\sum_{i=1}^{48}{\text{wh}_{l,\ i}\bullet V}_{i}+bh_{l},\ \ \ \ l=1,\ldots,8\)(23)
where \(wo_{l}\) and bo are the weights and bias in the
output layer, respectively. \(f_{h}\) is the tansig function in the
hidden layer. \(ah_{l}\) is the intermediate variable in the hidden
layer. \(\text{wh}_{l,\ i}\) and \(bh_{l}\) are the weights and biases
in the hidden layer, respectively. These model parameters are provided
in the Github platform mentioned above.