Keywords:
Artificial Intelligence, Deep Learning, Species Classification, Neural Network, Pattern Recognition, Big Data
Introduction
Deep learning, a branch of machine learning, is an artificial intelligence
approach which has been used for pattern
recognition across multiple domains \cite{Shen_2017,Golden_2017,Min_2016,Heaton_2016,Esteva2019,Esteva2017}. Whereas other machine learning approaches have been used for acoustic classification \cite{Aide_2013}, ecological modelling and studying animal behaviour \cite{Olden_2008,Valletta_2017,Christin_2019}, deep learning approaches have demonstrated the ability to overcome several machine learning limitations. One of the challenges of machine learning approaches is the need for superior domain knowledge and high-level programming skills \cite{LeCun_2015,Christin_2019}\cite{inproceedings,NIPS2014_5347}. Further, the data feature engineering step in machine learning is a complex and often tedious task that discourages many from using these techniques. Deep learning overcomes this feature engineering step by ensuring that the algorithm finds features by itself automatically \cite{inbook}.
In ecology, however, the use of deep learning is still in its infancy. This is despite its potential to revolutionalize
applied ecology in identification and classification of species,
behavioural studies, population monitoring and citizen science,
ecosystem management and conservation \cite{Christin_2019,Lamba_2019,Miao_2019,Ditria_2019}. Several research articles continue to implement new and interesting applications \cite{Terry_2020,Talas_2019,Priyadarshani_2020}. However, the techniques used still remain cryptic and inaccessible to most ecologists who are experts in their domains but who have no experience with these techniques.
Ecology is particularly ripe for the applications of deep learning owing to the increase in complex ecological datasets over the past few years ranging from
genomic to ecosystem-scale data, also known as Big Data. The Big Data
derived from the increasingly sophisticated automatic monitoring by sensors can no
longer be manually processed as it is redundant and time
consuming \cite{Weinstein_2017,Norouzzadeh_2018}.
Deep learning is specifically better than other methods in dealing with non-linear
complex data commonly encountered in
ecology \cite{Christin_2019}. In fact, all winning methods for the most recent LifeCLEF
contests have been deep learning-based \cite{Joly_2017}. Reviews and proposals for these have been put forward and the field
feels right for disruption \cite{Christin_2019,Lamba_2019}. Deep
learning has been touted as a contender in solving problems with immediate application ranging from
illegal trafficking of wildlife products to large scale automated
ecosystem management tools - areas that are expensive and logistically
expensive to manage \cite{Cantrell_2017,Christin_2019}.
A lot of the challenges that prevented deep learning from having practical applications have been eliminated with advancements research on transfer learning and
data augmentation \cite{Shorten_2019}. This has led to a reduction in the data required to make accurate world-class models. Furthermore, the recent wave in
computer hardware innovation for GPU’s and CPU’s has also accelerated by
reducing the cost of accessing the processing power required for
accurate model development.
Naturalists have been identifying species for the past two centuries, laying the foundations of the ecological science. However, even today, most of the taxonomic work and species identification work is still manual and reliant on a few domain experts. Therefore, to illustrate to non-experts how they can prototype these previously mysterious techniques this paper takes you step by step on the various stages and offers open-source code in form of an annotated Jupyter Notebook that can be used by anybody in the world to produce expert-level accuracy on whatever supervised species classification they want to carry out. The tutorial is designed in a way that it can be implemented in the lowest resourced environment and unlock great application in taxa image identification in ecology the world over that we can hardly imagine at the moment.