Discussion
CNN Accessibility
This study demonstrates that AI-based identification and classification
models are more accessible than previously thought. Until now,
processing of camera trap images has been limited by human observers,
expense, processing time, and ignorance of computer science techniques
for in ecological studies. Employing labeling services (e.g. Google
Cloud) can be unreliable for processing large datasets, and to have
images labeled and processed currently costs approximately $0.05 per
image (Google Cloud); which may not be practical when tens of thousands
of images are involved.
An increasingly accurate and efficient method of image processing is
transfer training (e.g. Deepak et al. 2019, Swati et al. 2019, Shi et
al. 2019), which is an especially desirable technique for studies with
limited data (Shin et al. 2016). Despite improvements in this training
architecture, the use of these methods in ecology has been limited.
Transfer training saves time and reduces data requirements, allowing for
smaller studies to spend less time processing while still calibrating
the architecture with specific images and training the model on a
percentage of their complete dataset. Additionally, transfer training
prevents overfitting of the model, which can be an issue when using a
smaller number of images (Deepak and Ameer 2019, Han et al. 2018).
A smaller image set allows the model to be more flexible, making it more
applicable for ecologists than other advanced machine learning
techniques (Xie et al. 2016). Feature extraction with transfer training
provides camera trap projects an alternative option to starting a CNN
architecture from scratch, instead opting to use a pre-trained CNN
product (e.g. Microsoft MegaDetector) or unsupervised learning
techniques (e.g. cluster analysis).
By using open-source programs and premade neural nets, models can be
built to simply remove images without animals or to fully automate the
classification of species. This study, along with similar studies (e.g.
Tabek et al. 2019), provides evidence that a reliable identification and
classification model can be created with open-source tools (e.g.
Tensorflow) by using transfer learning and premade neural networks.
Further, we completed this process using a very limited set of images
and achieved encouraging results. This technology could be especially
desirable for researchers wishing to eliminate false positives as well
as to quickly sort and label species classes.
Calibration Analysis
Currently, accuracy is the standard metric to evaluate classification
models for camera trap studies (Gomez et al. 2016, Norouzzadeh et al.
2018, Swanson et al. 2015). We suggest the optimization of customized
models also be based on F-1-score rather than relying on accuracy alone,
because accuracy can be heavily biased by TNs (Wolf et al. 2006). This
the greater than 20% difference between our test accuracy (TNs
excluded) and validation accuracy (TNs included).
The metrics used to optimize a model will depend on the purpose of the
project and the resources available to the researcher. The F-1-score can
be broken down into precision and recall, both of which can be optimized
for different purposes. In a study focusing on rare species (e.g.
Alexander et al. 2016, Karanth et al. 1995), precision should be
optimized to ensure the detection of all possible occurrences of
animals. Alternatively, recall should be optimized if processing time is
limited and every image of an animal is not essential for the global
analysis. Optimizing recall is ideal for a general survey of common,
easily identified animals (e.g. Chitwood et al. 2017).
Optimizing Model
Performance
Analyzing model performance during training is especially useful to
determine which classes the model is not identifying and is easily
visualized using IOU graphs. Precision during training did not seem to
depend on the number of images used to train each class; rather, the
type of object the class refers to was most important in determining the
model. Objects with unique shapes, color patterns, and textures (e.g.
turkey and armadillo) were detected by the model more easily (Fig. ).
The model was not as successful with objects that were small and
difficult to distinguish from the background (e.g. grey squirrel),
similar to another class (e.g. coyote and dog), or when train examples
were highly variable in the subjects within the same class (e.g. humans
and vehicles).
Depending on the aim of the study, the choice of metric allows the
researcher to facilitate either an ID or CL model. Certain camera trap
studies benefit greatly from automating the removal of TNs, especially
when focusing on topics such as camera trap effectiveness (e.g.
Ferreira-Rodríguez et al. 2019, Edwards et al. 2016) or instances where
human-supervised processing will be required to extract details such as
behavior. To focus a model on detection of objects rather than
classification, researchers should focus on metrics associated with ID.
The use of this type of identification model would allow researchers to
decrease processing time and ensure detection of objects while not being
overly concerned with the accuracy of species classification by the
model. Alternatively, studies focusing on general ecosystem monitoring
(e.g. Steenweg et al. 2017, Jiménez et al. 2010) or density of common
species (e.g. Parsons et al. 2017) would benefit from a CL model, and
should use CL metrics to build a model fully capable of both identifying
and classifying species.
Several methods may be employed to adjust the model’s parameters. CTs
are a simple way to a model to reach the desired metric’s optimal value.
If optimization cannot be reached by of CTs the model can be further
improved by adding images to classes which the model consistently
predicts incorrectly. This will help the model learn from the dataset
and objects
As biodiversity declines worldwide (Kolbert 2014), employing commonly
used computer science techniques in future camera trap studies will
greatly enhance our ability to monitor wild populations.
Conclusions
- Transfer training with bounding boxes is successful and requires far
fewer training images than traditional model building.
- Identification and classification models built using transfer training
and small image sets can be very successful with species that are
easily distinguished. Species that are more difficult to distinguish
can also be identified but require more training images.
- The traditional metric of accuracy can give a false sense of
confidence in a model because of inflation by true negatives. F-1
should be used for general purposes because it is not biased by true
negatives.
- Studies focusing on simply removing true negatives do not require high
model performance studies attempting to classify species .