Our method is a two-stage method for prediction of vertebra landmarks in an X-ray image. In the first stage, we use an object detection network to predict individual vertebra's position as a bounding box. Then, in the second stage, we employ a regression model to estimate the corner landmarks (4 corner points) from each vertebra. 

Dataset:

*****Mistake in Training set landmarks : Image: "sunhl-1th-06-Jan-2017-182 A AP.jpg", vertebra 1&17

Object Detection Stage:

For the object detection network, we have used Faster RCNN. 
Implementation :  https://github.com/tryolabs/luminoth/tree/master/luminoth (Tensorflow Implementation)

Experiment 1:

We have used Resnet-101 as the base network to extract CNN- features. For that, we utilized the pre-trained weight and fine-tuned the network after block 2. 
Optimizer : SGD with momentum 0.9, learning rate 0.0003
Anchors:  base_size: 256 ,scales: [0.25, 0.5] , ratios: [1, 2]
For the training dataset, we increased the width and height of the groundtruth bounding box by (50,10) pixels.  Why 50 pixels? to include a portion of ribs as additional visual cues.
Best result so far.

Experiment 2:

Limited the maximum class detection to 17 and changed the optimizer to Adam. Didn't perform well.
On validation set:
Average Precision (AP) @ [0.50] = 0.920
Average Precision (AP) @ [0.75] = 0.566
Average Precision (AP) @ [0.50:0.95] = 0.526
Average Recall (AR) @ [0.50:0.95] = 0.603
Removed the maximum class detection and kept the optimizer as Adam, still poor performance.
Average Precision (AP) @ [0.50] = 0.961
Average Precision (AP) @ [0.75] = 0.588
Average Precision (AP) @ [0.50:0.95] = 0.552
Average Recall (AR) @ [0.50:0.95] = 0.637

Experiment 3:

Changed the loss function to give more weightage to regression loss than classification loss. regression wt =1.0, classification wt =0.1
validation set:
Average Precision (AP) @ [0.50] = 0.965
Average Precision (AP) @ [0.75] = 0.750
Average Precision (AP) @ [0.50:0.95] = 0.617
Average Recall (AR) @ [0.50:0.95] = 0.694

Experiment 4:

Kept the SGD, with momentum 0.9. Increased width and height of ground truth bounding box by (10,10) pixels.
Decreased SMAPE