2.3 Feature calculation
The next step is to calculate features from the ACC data. A feature is a specific mathematical description (such as e.g. the mean, the standard deviation, etc.) of the ACC signal within a segment, which will form the input to the machine learning models (Brown et al., 2013). Using functions calculate_feature_time and calculate_feature_freq, two basic feature sets are calculated. The first, time-domain feature set, includes: mean, variance, standard deviation, max, min, range and ODBA, where ODBA is short for Overall Dynamic Body Acceleration. This value has been proven to be correlated with the animal’s energy expenditure (Wilson et al., 2019). These features are calculated for each ACC axis separately (denoted with prefix x, y, z in the output data frame), except for ODBA, which is calculated using all available axes. The frequency-domain feature set includes: main frequency, main amplitude and frequency entropy. Also, these features are calculated for each ACC axis separately (denoted with prefix x, y, z). Calculations of these features are based on Fast Fourier Transformation (FFT) of raw ACC data. Frequency entropy here measures unpredictability of the signal. It is worth noting that due to specific ACC sampling settings (e.g., Gilbert et al., 2016), some of the resulting ACC datasets may not have a high enough sampling frequency to log useful frequency information (Nathan et al., 2012). In these cases, it is better not to use frequency-domain features for behaviour classification. In addition, it should be considered that the functions calculate_feature_time and calculate_feature_freq provide an essential but not an exhaustive list of potential features. Since it has been asserted that feature engineering can improve the performance of machine-learning models (Boehmke & Greenwell, 2019), users may consider calculation of custom features. All functions in the rabc package are also able to process custom features after the user has included these in the feature data frame using functions cbind or bind_cols (from the dplyr package).