2.1 ACC dataset preparation and behaviour labels
Segments of continuous ACC data will need to be translated into meaningful behaviours. For raw ACC data segmentation, there are two choices: even-length segmentation and variable-length segmentation (Bom, Bouten, Piersma, Oosterbeek, & van Gils, 2014). Variable-length segmentation requires an algorithm to detect behaviour change points and may thus be prone to error. Even-length segmentation does not require these additional calculations and is therefore much easier to implement. However, even-length ACC segments will inevitably contain behaviour change points (and thus multiple behaviours) affecting down the line processing and behaviour classification. An ACC segment should be sufficiently long to contain enough data to be representative of a behaviour (and, thus, interpretable as a specific behaviour type), whereas its length should be limited to avoid inclusion of multiple behaviours as much as possible. Regarding the inevitable segments where behaviour transitions take place, we recommend retaining these segments in the model training. Although these data might decrease the accuracy of the classification model, they will make the model more robust and avoid overestimating model performance. The rabc package only supports even-length segmentation data. The input data should be a data.frame or tibble containing raw ACC data including the behaviour associated with the ACC data. For tri-axial ACC data, each row of equal length should be arranged as ”x,y,z,x,y,z,…,behaviour”, where “behaviour” is the (primary) behaviour observed during that segment. For dual-axial ACC data, it should be arranged as ”x,y,x,y,…,behaviour” and for single-axial ACC data as ”x,x,…,behaviour”.
The here used tri-axial ACC demo dataset from white stork (Ciconia ciconia ) (data accessible from the AcceleRater website: http://accapp.move-ecol-minerva.huji.ac.il/, see Resheff et al., 2014) was measured at 10.54 Hz. Forty tri-axial measurements, totalling 3.8 seconds, were used to form a behaviour segment. The dataset includes 1746 segments each forming a row in the dataset. Each row contains 121 columns. The first 120 columns are ACC measurements from three orthogonal axes, arranged as x,y,z,x,y,z,…,x,y,z. The final column is of type character containing the corresponding behaviour. The dataset contains 5 different behaviours including ”A_FLIGHT” - active flight (77 cases), ”P_FLIGHT” - passive filght (96), ”WALK” - walking (437), ”STND” - standing (863), ”SITTING” - sitting (273).