2. 4. Signal pre-processing and feature extraction
Analog data obtained from the output voltage responses of the sensors was sampled at 2 \(s^{-1}\) frequency during the whole cultivation process. Fig.2 shows a typical sensor circuit and its interface diagram.
In Fig. 2, Vh is a fixed voltage for the heater of the sensor (5 V), VC is the upper reference voltage (5 V), RL is the load resistor and RS is the sensor resistance. With pure air Rs is high. With the presence of detectable gases, RS changes with the variation of gas concentration. VC is a fixed voltage (5 V). By measuring the voltage on the resistor RL, the sensor response (V0 ) can be calculated by the following equation:
\(V_{0}\ =\ \frac{R_{L}}{R_{L}+R_{S}}*\ V_{C}\)
Each sensor reacts differently to volatiles of the gas headspace of the bioreactor. The output voltage is related to the ethanol concentration but it doesn’t directly include the concentration levels. However, it is known that if the concentration level changes, the output voltage responses of the sensors change also (Omatu and Yano 2016 ; Kiani, Minaei et al. 2016)
For ethanol prediction, the sensor response should first be further pre-processed to obtain comprehensible signals. Second, some features from the pre-processed signals should be extracted. In the next step dimensionality reduction should be performed on the extracted features. Dimensionality reduction projects the feature vector onto a lower dimensional space in order to avoid problems associated with high-dimensional, sparse datasets and redundancy (Aguilera, Lozano et al. 2012). Finally, the reduced features should be imported to the prediction model. Fig. 3 shows the block diagram of the proposed ethanol prediction algorithm.
One of the simplest methods of signal pre-processing which is also widely used for drift compensation is the transformation of individual sensor signals based on the initial value of the sensor response. This process compensates for noise, drift and also for inherently large or small signals (Di Carlo and Falasconi 2012). For this reason the following equation was applied.
Where Vi  is the response (the voltage of the sensor) of the i th sensor, \(V_{0,i}\) is its baseline and \(S_{i}\) corresponds to the modified signal.
The next step in the ethanol prediction algorithm is extracting useful features from the output signals. In each measurement cycle, the sensors are exposed to the headspace gas of the bioreactor for 10 seconds, which causes changes in the output signals. In the next step the odorant is flushed out of the sensor using the oxygen gas and the sensor returns back to its baseline. The time during which the sensor is exposed to the odorant is referred to as the transient phase while the time it takes the sensor to return to its baseline resistance is called the recovery phase. In order to exploit the obtained information in the transient phase, two representative features were extracted from each sensor:
In total, each measurement cycle was characterized by 6 variables (i.e., 3 sensors × 2 features per sensor). In order to quantify the amount of useful information for predicting ethanol concentration from all the variables, principle component analyses (PCA) was performed. The process of PCA is to find a new coordinate system of the mean centered data set, whose axis are perpendicular and have maximal variance in decreasing order. The direction containing the most of the variance of the data is called the first PC. The second PC carries the maximum variance of the rest data and so on. These PCs are statistically unrelated from each other (Otto, 1999; AS, 2006)