P

Faster R-CNN Tensorflow Model

Building on the basics of Convolutional Neural Networks, the Faster R-CNN essentially utilizes the CNN computed features and the Region Proposal Network half of the model by using the features collected to detect bounding boxes that have a probability of containing the object(s) of interest by obtaining bounding boxes, labels assigned to the boxes and probabilities (objectiveness score) for each label and box. The architecture for the model is as follows: Region Proposal Network, Anchors, Training/Loss, Region of Interest (RoI) Pooling, Region-Based CNN, and Post Processing. Region Proposal Networks take an image as input and output rectangular object proposals, which are the regions believed to contain the object, with an objectiveness score. This is done by sliding a small network over the convolutional feature map output created by the last convolutional layer. The network then takes a spatial window of the input convolutional feature map and the window is then mapped to a lower-dimensional feature using ReLU. The newly created feature is input into fully connected layers of a box-regression layer and a box-classification layer. This creates a single, unified network for object detection.
Anchors are fixed sized reference bounding boxes within the window placed uniformly throughout the original image and the anchors are both Translational-Invariant, meaning that the functions and the proposals of that anchor are translative to varying locations of an object within the image. Multi-Scale anchors classify and regress bounding boxes with reference to anchor boxes of varying scales and aspect ratios to address multiple scales and sizes of images.