The benefits of implementing a Faster RCNN Tensorflow model is that the training interval in which it takes in order to create a checkpoint file along with acceptable loss values is decreased by half the time as opposed to using the SSD MobileNet Tensorflow model. In addition, it also detects a higher amount of objects per image as well as increased accuracy for detection in comparison. However, the model is not able to be of use in real time object detection analysis (i.e. mobile smartphones) due to lacking the depthwise and pointwise convolutional layers, which also contributes to a slower detection rate.
SSD (Single Shot Detector) Mobilenet v1 Architecture
The Single Shot Detector Mobilenet architecture model is derived from depth wise separable convolutions, defined as a form of factorized convolutions which factor a standard convolution into a depthwise convolution and a 1X1 convolution known as a pointwise convolution. The Mobilenet model applies a default single filter to each neural input channel to begin feature extraction. Following a depthwise convolution, a 1X1 pointwise convolution follows to combine the outputs of the depthwise convolution. The depthwise convolution will eventually split the resulting outputs into two layers, a separate layer for filtration and a layer for combination. The mixture of both output filtration and combination minimizes the model size reducing computational power demands (Howard et al, 2017). Moreover, depthwise convolution maximizes model efficiency by preventing GPU overconsumption on less demanding devices (e.g. mobile devices and laptops). However, a reduction in GPU consumption also creates a lack of usage equilibrium, which hinders the training model causing slow progress and intervals. Figure 1.3below exhibits the consecutive and systematic pooling layers of the SSD Mobilenet model and the compression/condensing of pointwise and depthwise outputs: