2.3 Feature calculation
The next step is to calculate features from the ACC data. A feature is a
specific mathematical description (such as e.g. the mean, the standard
deviation, etc.) of the ACC signal within a segment, which will form the
input to the machine learning models (Brown et al., 2013). Using
functions calculate_feature_time and calculate_feature_freq, two
basic feature sets are calculated. The first, time-domain feature set,
includes: mean, variance, standard deviation, max, min, range and ODBA,
where ODBA is short for Overall Dynamic Body Acceleration. This value
has been proven to be correlated with the animal’s energy
expenditure
(Wilson et al., 2019). These features are calculated for each ACC axis
separately (denoted with prefix x, y, z in the output data frame),
except for ODBA, which is calculated using all available axes. The
frequency-domain feature set includes: main frequency, main amplitude
and frequency entropy. Also, these features are calculated for each ACC
axis separately (denoted with prefix x, y, z). Calculations of these
features are based on Fast Fourier Transformation (FFT) of raw ACC data.
Frequency
entropy here measures unpredictability of the signal. It is worth noting
that due to specific ACC sampling settings (e.g., Gilbert et al., 2016),
some of the resulting ACC datasets may not have a high enough sampling
frequency to log useful frequency information (Nathan et al., 2012). In
these cases, it is better not to use frequency-domain features for
behaviour classification. In addition, it should be considered that the
functions calculate_feature_time and calculate_feature_freq provide
an essential but not an exhaustive list of potential features. Since it
has been asserted that feature engineering can improve the performance
of machine-learning models (Boehmke & Greenwell, 2019), users may
consider calculation of custom features. All functions in the rabc
package are also able to process custom features after the user has
included these in the feature data frame using functions cbind or
bind_cols (from the dplyr package).