INTRODUCTION
Deep learning a branch of machine learning is an artificial intelligence
approach which has demonstrated a record-breaking streak in pattern
recognition in multiple domains\cite{LeCun_2015}. It has remarkably within a very short time revolutionalized fields from robotics, automotive engineering (self-driving cars), finance, medicine, bioinformatics, games, consumer recommendations and many other fields \cite{Shen_2017,Golden_2017,Min_2016,Heaton_2016,Esteva2019,Esteva2017}. The hype generated by these streaks has
propelled its permeation into everyday computer applications and now
increasingly in scientific disciplines including life sciences such as biological and ecological applications\cite{Christin_2019}. Its
ease of use, flexibility and record-breaking accuracy has led to an increased interest in its potential in a wide variety of fields\cite{Krizhevsky_2017,Hinton_2012,LeCun_2015}.
In ecology, however, the field is still in its infancy, a literature review
puts both peers and non-peer reviewed papers at 46 as of
April 2018 mostly using CNNs and RNNS and non using the recent Deep
Reinforcement Learning \cite{Christin_2019}. This is despite its potential to revolutionalize
applied ecology in identification and classification of species,
behavioural studies, population monitoring and citizen science,
ecosystem management and conservation \cite{Christin_2019,Lamba_2019} .
Whereas other machine learning approaches have been used for acoustic
classification \cite{Aide_2013}, ecological modelling and studying
animal behaviour \cite{Olden_2008,Valletta_2017,Christin_2019}, deep learning approaches have demonstrated the ability to leap
frog the bottlenecks and complexity encountered when developing these
systems. One of the challenges of machine learning approaches is the
need for superior domain knowledge and more than average programming
skills which are both expensive and in short supply \cite{LeCun_2015,Christin_2019}.
Ecology is particurarly ripe for the applications of deep learning owing to the
explosion of complex ecological datasets over the past few years ranging from
genomic to ecosystem-scale data- also known as Big Data. The Big data
derived from the increasingly sophisticated automatic monitoring by sensors can no
longer be manually processed as it is redundant, tedious, time
consuming and sometimes too complex for human beings to comprehend\cite{Weinstein_2017,Norouzzadeh_2018}, hence the need to use more efficient strategies for this.
Deep learning is specifically better than other methods in non-linear
complex data analysis - data challenges commonly encountered in
ecology \cite{Christin_2019}. In fact, all methods for the most recent LifeCLEF
contests have been deep learning-based\cite{Joly_2017}. Reviews and proposals for these have been put forward and the field
feels right for disruption\cite{Christin_2019,Lamba_2019}. Deep
learning has been touted as a contender in solving problems with immediate application ranging from
illegal trafficking of wildlife products to large scale automated
ecosystem management tools - areas that are expensive and logistically
expensive to manage \cite{Cantrell_2017,Christin_2019}.
A lot of the bottlenecks of the past are
being eliminated with groundbreaking research on transfer learning and
data augmentation \cite{Shorten_2019}. This has led to a reduction in the data required to make accurate world-class models. Furthermore, the recent wave in
computer hardware innovation for GPU’s and CPU’s has also accelerated by
reducing the cost of accessing the processing power required for
accurate model development - which is heavy on matrix multiplications. The overall move from the AI winters of the
past and a prediction for the “Singularity” also poses interesting
opportunities for the future. Life scientists such as ecologists, therefore, need to jump into this
bandwagon and take life science to the next level.
However, all is not rosy - deep learning is still theory-heavy and
difficult to implement and domain scientists do not have the time or
expertise to delve into these powerful tools. Deep
learning is a complicated path that involves evaluating the tools,
parameters, datasets, training time and computing power. For this reasons , it still remains siloed in well funded research labs and big technology
companies who rarely have the incentive to publicize their processes.
Classical naturalist have identified species for the past two decades laying the foundations of the ecological science that we thrive in today. Therefore to illustrate to non experts how easily they can prototype these previously mysterious tecnhiques this paper takes you step by step on the various stages and offers open source code in form of an annotated Jupyter Notebook that can be used by anybody in the world to produce world class
accuracy on whatever supervised species classification they want to carry out. The tutotial is designed in a way that it can be implemented in the lowest resourced environment and
unlock great application in species identification in ecology the world over than we can hardly
imagine at the moment.
Previously, without tight synergies between computer science
professionals and ecologists deep learning work on ecological datasets
have proved difficult despite obvious benefits. The few deep learning
practitioners are in so much demand from far wealthier giant companies
starving the lesser funded ecological world off personnel. This roadmap
was popularized by the Fast.ai course created by the visionary Jeremy Howard and
Rachel Thomas both scientists at the University of San Fransisco data
science institute and their enthusiastic students who are now implementing these algorithms in other fields and turning whole modes of thinking in traditional industries inside out. This paper sheds light into the radical shift that
has occurred inaccessibility of AI tools and demonstrates this by
outlining how to build a simple species classifier that has world-class
accuracy. The code can be accessed in the Jupyter Notebook here:
https://bit.ly/2SHz9LYhttps://bit.ly/2SHz9LYIMPLEMENTATION
Sea stars are important species in our understanding of marine invertebrate communities. Intertidal relationships between the sea star Pisaster ochraceus and the mussel Mytilus californicus was actually used to coin the term keystone species \cite{Paine_1966}. Following that classical study, it would, therefore, be interesting to use sea stars as model species to prototype the classifier AI system. Figure \ref{488749} illustrates our workflow to achieving a minimum viable product: