Our method is a two-stage method for prediction of vertebra landmarks in an X-ray image. In the first stage, we use an object detection network to predict individual vertebra's position as a bounding box. Then, in the second stage, we employ a regression model to estimate the corner landmarks (4 corner points) from each vertebra.
Dataset:
*****Mistake in Training set landmarks : Image: "sunhl-1th-06-Jan-2017-182 A AP.jpg", vertebra 1&17
Object Detection Stage:
For the object detection network, we have used Faster RCNN.
Experiment 1:
We have used Resnet-101 as the base network to extract CNN- features. For that, we utilized the pre-trained weight and fine-tuned the network after block 2.
Optimizer : SGD with momentum 0.9, learning rate 0.0003
Anchors: base_size: 256 ,scales: [0.25, 0.5] , ratios: [1, 2]
For the training dataset, we increased the width and height of the groundtruth bounding box by (50,10) pixels. Why 50 pixels? to include a portion of ribs as additional visual cues.
Best result so far.
Experiment 2:
Limited the maximum class detection to 17 and changed the optimizer to Adam. Didn't perform well.
On validation set:
Average Precision (AP) @ [0.50] = 0.920
Average Precision (AP) @ [0.75] = 0.566
Average Precision (AP) @ [0.50:0.95] = 0.526
Average Recall (AR) @ [0.50:0.95] = 0.603
Removed the maximum class detection and kept the optimizer as Adam, still poor performance.
Average Precision (AP) @ [0.50] = 0.961
Average Precision (AP) @ [0.75] = 0.588
Average Precision (AP) @ [0.50:0.95] = 0.552
Average Recall (AR) @ [0.50:0.95] = 0.637
Experiment 3:
Changed the loss function to give more weightage to regression loss than classification loss. regression wt =1.0, classification wt =0.1
validation set:
Average Precision (AP) @ [0.50] = 0.965
Average Precision (AP) @ [0.75] = 0.750
Average Precision (AP) @ [0.50:0.95] = 0.617
Average Recall (AR) @ [0.50:0.95] = 0.694
Experiment 4:
Kept the SGD, with momentum 0.9. Increased width and height of ground truth bounding box by (10,10) pixels.
Decreased SMAPE