The species image data is arranged into subfolders for each class of species that contain different images of the same species. The data is then uploaded as a folder to the platform that we are going to use. In our case study, we simply uploaded our data to google drive and were able to access this data from Google Colab by running the first line of code in the Jupyter notebook. 
Refer to #CODE BLOCK 2# on Jupyter Notebook
This connects your data on google drive to your Jupyter notebook session running on Google Colab. You might need to provide a secret code on the output to give permissions for this to happen.

Organizing Data

The next thing you have to do is save that data path as a path variable and use that path to create training, validation datasets. 
Refer to #CODE BLOCK 3# on Jupyter Notebook
The training dataset is what the deep learning algorithm will use to learn the features of how one sea star class differentiates itself from another. The validation set avoids overfitting of the data to ensure that our model can generalize broadly to other sea stars it has not seen before. This prevents the danger of overfitting - which is simply the model "cramming into memory" the sea stars that are in the training set, this could lead to misleading interpretations where there is high model accuracy, but which cannot generalize to other sea stars, not found in the original training data set \cite{article,ripley_1996}
The training dataset depends on the dataset, but would commonly be around 60-80% of the dataset, validation around 20% of the original dataset . You can automate this using the built-in function in fast.ai and creates a data subset to be fed into the neural network for the creation of the species classification model as illustrated by the following code blocks:
Refer to #CODE BLOCK 4# on Jupyter Notebook
We can ensure what we have in our data block is accurate by viewing the images and exploring the basic statistics of our various data organization folders created by the build-in fast.ai function by running the following code blocks:
Refer to #CODE BLOCKS 5,6,7 # on Jupyter Notebook
We can see that our folders have 3 classes, 219 images in the training dataset and 54 on the validation dataset. We can also see the visual of the images and their respective labels.

III. DEEP LEARNING PHASE

Deep Learning Model Creation

In our case study, we have a computer vision problem, so we will use Convolutional Neural Networks (CNN) \cite{Krizhevsky_2017}. In our case, we will use the RESNET 34 architecture that is not too complicated but delivers superior results compared to other architectures  \cite{he2015deep}. We use accuracy as a test metric for our model. We then call in the build-in CNN-learner from the fast.ai library pass in our data, the RESNET 34 architecture and the metrics we are measuring for as illustrated in the following code blocks:
Refer to #CODE BLOCK 8 # on Jupyter Notebook
We can then run it for one cycle, passing in the number of epochs that we want the model to run for. In our case, we run for 10 epochs and achieved an accuracy of about 85%.
Refer to #CODE BLOCK 9 # on Jupyter Notebook
Too few or too many epochs can be a problem and it is best to aim for stopping when there is no reduction in error rate or increase in accuracy if using accuracy metrics. We then save the model - we can already have good enough accuracy for the next stage of inference and deployment as illustrated by the following code blocks:
Refer to #CODE BLOCK 10 # on Jupyter Notebook.

Model Tuning 

If there is a need to create a more accurate model particularly while up against benchmark problems there are techniques such as retraining an already trained  model, automatically searching for a suitable learning rate using the build-in learning rate finder which is novel to fast.ai. The technique to further tune the model can be found in the below code blocks:
Refer to #CODE BLOCKS 11,12,13 # on Jupyter Notebook
Then using that learning rate you can fit other cycles until the model gets to a suitable accuracy. Using this approach for our species classifier, we are able to improve the accuracy to 87%.

Model Tuning by Data Exploration

With a trained model, we now investigate particular images that might be causing the model to be less accurate. First, we draw a confusion matrix of the species to see the map of the true positives and true negatives and understand which species are making your model less accurate \cite{Ting2017}. In our case, and according to figure.3 , we have misclassified 4% of  Pisaster ochraceus, 1% of Pyncopodia helianthoides, and 2% of S. dawsoni. We also examined each misclassifed image to check the original data curation accuracy (Fig. 4).