2.4 Feature selection
Feature
selection is the process of selecting a subset of relevant features for
use in model building (Chakravarty, Cozzi, Ozgul, & Aminian, 2019). In
animal behaviour studies using ACC, tens of features are typically used
in model building (e.g., Shamoun-Baranes et al., 2012). Although a
relatively small number, compared to many other machine-learning models,
there may still be redundancy in the feature set. Redundant features are
features that show high correlation with other features and are thus
likely to contribute similarly to the behaviour classification model.
Redundant features may also be “irrelevant” features that hardly
contribute to the classification model. Three aims are being served with
feature selection in this package. Firstly, less features will make the
model easier to interpret. Indeed, there may for instance be
biomechanical connections between features and the ultimate
classification model (e.g., Chakravarty et al., 2019). Secondly, fewer
features reduce the risk of overfitting and may therewith lead to better
behaviour classification from ACC data. Thirdly and finally, because of
lower computational requirements in assessing behaviour from ACC data,
reduced feature sets have greater potential to be calculated on-board
the ACC devices themselves, e.g. on-board of light-weight tracking
devices (e.g., Korpela et al., 2020; Nuijten, Gerrits, Shamoun-Baranes,
& Nolet, 2020) on which they can either be stored or relayed to
receiving stations.
The
rabc package’s select_features function uses a
combination
of a filter and a wrapper feature selection method. The filter part
removes any redundant features based on the absolute values of the
pair-wise correlation coefficients between features. If two features
have a high correlation, the function looks at the absolute correlation
of each of the two features with all other features and removes the
feature with the largest mean absolute correlation value. The threshold
correlation coefficient (cutoff) is user-defined with a default ”cutoff
= 0.9”. In the default constellation the filter function is turned off
(i.e. ”filter =
FALSE”).
The purpose of the wrapper is to select most relevant features. The
wrapper part applies stepwise forward selection (SFS) (Toloşi &
Lengauer, 2011) using the extreme gradient boosting (XGBoost) model,
which is not only used for feature selection but also for the final
classification model (see below). XGBoost is a scalable tree boosting
method that proved to be better than other tree boosting methods and
random forest (Chen & Guestrin, 2016). We also experienced ourselves
that XGBoost is fast to train and has good performance with limited
numbers of trees.
The
default limit to the number of features (no_features) is 5 but can be
user defined. The no_features also determines how many rounds of SFS
are being conducted. In the first round, each feature is individually
used to train a classification model by XGBoost. The feature with
highest overall accuracy will be kept into the selected feature set.
Then, in every following round, each remaining feature will be combined
with the selected feature set to train a classification model and the
one with the highest accuracy will be kept into the selected feature
set. The process will stop when the number of rounds equals the
no_features setting.
The
select_features function will return a list, of which the first member
(i.e., .[[1]]) contains a matrix providing the classification
accuracy for each of the features (columns) across all steps (rows, top
row being the first step) of the SFS process. Once a feature is selected
into the selected feature set, the remaining values in this feature’s
column are set to zero. The second member of the list (i.e.,
.[[2]]) contains the names of the selected features in the order
in which they were selected in the SFS process. The development of the
classification accuracy with each step in the SFS process is plotted
with function plot_selection_accuracy (Fig. 3). In the case of the
White Stork dataset, we can see that after the sixth selected feature,
“z_variance”, there is almost no further improvement in
classification accuracy with the addition of more features.