2. 4. Signal pre-processing and feature extraction
Analog data obtained from the output voltage responses of the sensors
was sampled at 2 \(s^{-1}\) frequency during the whole cultivation
process.
Fig.2 shows
a typical sensor circuit and its interface diagram.
In Fig. 2, Vh is a fixed voltage for the heater of the
sensor (5 V), VC is the upper reference voltage (5 V),
RL is the load resistor and RS is the
sensor resistance. With pure air Rs is high. With the
presence of detectable gases, RS changes with the
variation of gas concentration. VC is a fixed voltage (5
V). By measuring the voltage on the resistor RL, the
sensor response (V0 ) can be calculated by the following
equation:
\(V_{0}\ =\ \frac{R_{L}}{R_{L}+R_{S}}*\ V_{C}\)
Each sensor reacts differently to volatiles of the gas headspace of the
bioreactor. The output voltage is related to the ethanol concentration
but it doesn’t directly include the concentration levels. However, it is
known that if the concentration level changes, the output voltage
responses of the sensors change also (Omatu and Yano 2016 ; Kiani,
Minaei et al. 2016)
For ethanol prediction, the sensor response should first be further
pre-processed to obtain comprehensible signals. Second, some features
from the pre-processed signals should be extracted. In the next step
dimensionality reduction should be performed on the extracted features.
Dimensionality reduction projects the feature vector onto a lower
dimensional space in order to avoid problems associated with
high-dimensional, sparse datasets and redundancy (Aguilera, Lozano et
al. 2012). Finally, the reduced features should be imported to the
prediction model. Fig. 3 shows the block diagram of the proposed ethanol
prediction algorithm.
One of the simplest methods of signal pre-processing which is also
widely used for drift compensation is the transformation of individual
sensor signals based on the initial value of the sensor response. This
process compensates for noise, drift and also for inherently large or
small signals (Di Carlo and Falasconi 2012). For this reason the
following equation was applied.
Where Vi is the response (the voltage of the
sensor) of the i th sensor, \(V_{0,i}\) is its
baseline and \(S_{i}\) corresponds to the modified signal.
The next step in the ethanol prediction algorithm is extracting useful
features from the output signals. In each measurement cycle, the sensors
are exposed to the headspace gas of the bioreactor for 10 seconds, which
causes changes in the output signals. In the next step the odorant is
flushed out of the sensor using the oxygen gas and the sensor returns
back to its baseline. The time during which the sensor is exposed to the
odorant is referred to as the transient phase while the time it takes
the sensor to return to its baseline resistance is called the recovery
phase. In order to exploit the obtained information in the transient
phase, two representative features were extracted from each sensor:
- Peak height: change calculated as the difference in the final value
and the baseline value of sensor response in the transient phase.
- Peak area: calculated area from the signal response in the transient
phase.
In total, each measurement cycle was characterized by 6 variables (i.e.,
3 sensors × 2 features per sensor). In order to quantify the amount of
useful information for predicting ethanol concentration from all the
variables, principle component analyses (PCA) was performed. The process
of PCA is to find a new coordinate system of the mean centered data set,
whose axis are perpendicular and have maximal variance in decreasing
order. The direction containing the most of the variance of the data is
called the first PC. The second PC carries the maximum variance of the
rest data and so on. These PCs are statistically unrelated from each
other (Otto, 1999; AS, 2006)