[insert Figure 2 here]

Step 1: Problem Definition

When a new cosmetic design project is launched, the product type and form such as a facial powder or lipstick are first decided by the marketing team based on the target market, potential consumers, competing products, etc. The quality of the new cosmetic product depends on its sensorial and functional attributes. Usually, the consumer-desired attributes can be specified through interview and survey with potential consumers. The sensorial perceptions given by a cosmetic are the most essential for its satisfaction and repeated use by consumers.6 In practice, sensorial perception is assessed through sensorial evaluation. A number of panelists assess various cosmetic samples using well-defined protocols and their perceptions are quantified using sensorial ratings. Then, an overall sensorial rating can be obtained to represent the degree of satisfaction of the cosmetic.7 Note that in addition to perception, other factors such as packaging and price affect consumer’s purchase decision. These factors are not considered in this work and the objective function is to maximize the overall sensorial rating (\(q\)).
\(\max\text{\ \ \ q}\) (1)
In addition, the functional attributes are also needed to be satisfied. Each cosmetic has its unique functional attributes. For instance, a hair spray should dry rapidly and perfume should be transparent (Table 2).
The product attributes can be translated into relevant physicochemical properties (e.g., melting point for lipstick) and product specifications (e.g., sun protection factor for sunscreen product) using engineering know-how. For the four cosmetics in Table 2, the last column lists various properties related to their sensorial and functional attributes. How a lipstick is sensed by the lips is affected by its viscosity. The pH of a skin cream affects its safety. Then, a set of design targets (i.e., lower and upper bounds) on the properties can be identified based on the engineering know-how and product in-house data. These bounds serve as constraints in the optimization problem.
\(PL^{k}\leq P^{k}\leq PU^{k},\ \ \ k\in K\) (2)
where \(P^{k}\) is the k -th desired property. \(K\) is the set of properties. \(PL^{k}\) and \(PU^{k}\) are the lower and upper bounds, respectively. Note that the nomenclature is presented in Supporting Information.

Step 2: Ingredient Candidate Generation

To provide multiple desired attributes, many chemical ingredients are needed. Cosmetic ingredients are classified into different types based on their functionalities. Table S1 lists the ingredient types that are widely used in various cosmetics and their functions.7,28 For instance, an abrasive in a facial cleanser is made up of solid particles used for physically cleaning hard surface such as epidermis. Three types of moisturizers (i.e., emollient, humectant, and occlusive) can be used to provide hydration effect. Emollient can improve the skin’s water-oil balance, humectant inhibits water evaporation, and occlusive can form a water-repellent layer to reduce water loss. For a cosmetic, the needed ingredient types can be identified based on the fundamental formulation science and the desired product attributes.
For each ingredient type, a set of ingredient candidates can be generated using databases29,30 and computer-aided tools.14,31 Regarding each of the ingredient types in Table S1, the last column lists two commonly used ingredient candidates. For instance, lactic acid and triethanolamine are often used as an acidic and alkaline pH buffers, respectively. With the years of development in the cosmetic industry, hundreds of ingredient candidates exist for each ingredient type. To reduce the search space, the candidates can be pre-screened using ingredient screening tools based on cost, regulations, availability, etc. to generate a more organized pool of ingredient candidates
\(\left.\ \par \begin{matrix}I_{A}=\left\{I_{A,1},I_{A,2},\ldots,I_{A,a}\right\}\\ I_{B}=\left\{I_{B,1},I_{B,2},\ldots,I_{B,b}\right\}\\ \par \begin{matrix}\cdots\\ I_{Z}=\left\{I_{Z,1},I_{Z,2},\ldots,I_{Z,z}\right\}\\ \end{matrix}\\ \end{matrix}\text{\ \ \ }\right\}\) (3)
where \(I_{A}\), \(I_{B}\),…, \(I_{Z}\) are ingredient types.\(I_{A,1}\), \(I_{A,2}\),…, \(I_{A,a}\), etc. represent the generated ingredient candidates. Here, the subscripts \(a\), \(b\), and\(z\) denote the number of candidates in each ingredient type. Each candidate has different properties (e.g., density, solubility and pH) which can be collected from the literature, database, and experiment. The selection of ingredients is intuitively a discrete-continuous optimization problem. Each ingredient candidate can be assigned a binary variable \(S_{i}\) to control ingredient selection and a continuous variable (e.g., volume fraction \(V_{i}\)) to denote its composition. If the i -th candidate is selected, \(S_{i}\) is equal to 1 and\(V_{i}\) is constrained by its lower (\(VL_{i}\)) and upper (\(VU_{i}\)) bounds. Otherwise, \(S_{i}\) and \(V_{i}\) are equal to 0.
\(\sum_{i}{V_{i}=1}\) (4)
\(VL_{i}\bullet S_{i}\leq V_{i}\leq VU_{i}\bullet S_{i}\),\(i\in\left\{I_{A,1},I_{A,2},\ldots,I_{Z,z}\right\}\) (5)
In addition to ingredients, microstructure can affect the properties when certain product forms are used. Typically, the major microstructural features can be characterized by some geometric descriptors that can be correlated with the mixture properties by experiment and multi-scale modeling to account for the microstructure-property relationship.32 The last column of Table 1 lists the relevant microstructure descriptors for various commonly used cosmetic product forms. For example, the oil droplet size affects the viscosity and texture of a moisturizing lotion in the form of an oil-in-water emulsion.33 The emulsion type and particle shape can be decided using heuristics.34 Geometric descriptors ms such as particle size are continuous variables
\(msL\leq ms\leq msU\) (6)
where \(m\text{sL}\) and \(m\text{sU}\) are the lower and upper bounds, respectively. The microstructure is decided by both the formulation and manufacturing process design.35

Step 3. Model Identification

Model for Sensorial Perception

Surrogate model that captures the input-output data is built to predict the sensorial rating. After a surrogate model is trained, its analytical form can be used for optimization.
\(q=f\left(V_{I_{A,1}},I_{I_{A,2}}\ldots,V_{I_{Z,z}},ms\right)\) (7)
The first task is to collect training data. The input data can be the cosmetic recipes and the microstructures, namely (\(V_{I_{A,1}},V_{I_{A,2}},\ldots,V_{I_{Z,z}},ms\)). The output data is the corresponding sensorial rating (\(q\)). Here, the historical data of sensorial evaluations can be utilized. When the historical data is scarce, additional data sampling is required. By far, many efficient sampling approaches have been used in the cosmetic industry such as Latin-hypercube sampling, Plackett-Burman, full-fractional, etc. Referring to the “one in ten” rule, the number of data samples is preferably ten times more than the number of ingredient candidates. The second task is to build an accurate surrogate model. Currently, multiple types of surrogate models can be utilized such as linear regression, kriging, artificial neural network (ANN), radial basis function, etc. Among them, some surrogate models (e.g., random forest) cannot provide available derivative information while the derivatives of many other surrogate models are symbolically available such as linear regression, ANN with tansig kernel function, etc.36 Here, a surrogate model with available derivative information is preferred because solving a discrete-continuous optimization problem with no derivative information is very challenging. The hyperparameters of the surrogate model structure should be carefully tuned. The heuristics and experience reported in the literature can be consulted.36,37 Afterward, model accuracy needs to be validated. The widely used validation methods include K-fold cross validation and holdout method. If the model is not sufficiently accurate, the type of surrogate model and the hyperparameters should be re-selected.

Models for Target Properties

Three types of models can be applied for predicting the target properties: rigorous mechanistic model, short-cut model, and surrogate model. Typically, the formulation and application of cosmetics involve various phenomena (e.g., kinetics, thermodynamics, and transport). For any property, the associated phenomena should be first identified based on the basic engineering sciences and domain knowledge, followed by the identification of the relevant mechanistic models. Generally, rigorous models are the most accurate but more complex and sometimes with unknown parameters. The perfume diffusion model38 and ingredient percutaneous absorption model39 are examples. Instead of accounting fully the physical phenomena, simple short-cut model captures the property’s dependence on the most influential factors. Usually, short-cut model is sufficiently accurate within pre-specified conditions. Note that both rigorous and short-cut models involve many intermediate variables for describing the relevant phenomenon. The rigorous or short model for k -th desired property (\(P^{k}\)) can be represented as
\(P^{k}=G^{k}(IM_{I_{A,1}}^{k},IM_{I_{A,2}}^{k},\ldots,IM_{I_{Z,z}}^{k})\),\(k\in K\) (8)
\(\text{IM}_{i}^{k}=\text{IMG}^{k}\left(V_{i},ms\right)\),\(i\in\left\{I_{A,1},I_{A,2},\ldots,I_{Z,z}\right\}\) (9)
where \(\text{IM}_{i}^{m}\) denotes the intermediate variable related to the i -th ingredient candidate (e.g., vapor pressure and activity coefficient). If there are no suitable mechanistic models but data are available, surrogate models can be adopted,40 although the model validity is often limited to the range of available data. The input data should be the sampled cosmetic recipes and microstructure. The output data are the target properties. For the k -th property (\(P^{k}\)), its surrogate model is
\(P^{k}=g^{k}(V_{I_{A,1}},I_{I_{A,2}}\ldots,V_{I_{Z,z}},ms)\),\(k\in K\) (10)
Accordingly, for any desired property, a set of models (rigorous, short-cut, and surrogate) should be identified for use in the optimization.
The use of heuristics is often inevitable in cosmetic formulation.24,41 The reason is that some phenomena have not been identified or are poorly understood. For instance, a hydrocolloid thickener with a weak gel network structure is preferred for use in emulsion-based product to generate thixotropic behavior, although no formal justification has been given.34 In addition, heuristics can effectively help reduce the search space. Many heuristics, although not all, can be transformed into mathematical design constraints for use in the optimization. Table 3 shows the widely used forms of heuristics and the associated equations for formulated product design. For instance, if the number of ingredients for certain type of ingredient is suggested, an inequality constraint\(TL\leq\sum_{i}S_{i}\leq TU,\ \ i\in I_{X}\) can be generated.