Strengths and limitations
Even though the development of content validity for procedure-specific assessment tools requires using Delphi methodology, which is a consensus-based approach,30 of the eight studies in our scoping review, only Frederick et al. did so.20They discussed the potentially confounding variable of the attending physician providing direct supervision and guidance when evaluating a novice surgeon. This may account for why novice surgeons’ RHAS20 scores did not differ more relative to their more experienced colleagues. Case mix may also explain this lack of difference in scores.
RHAS 20 demonstrated both construct and discriminative validity and appears to be feasible. It is argued that many of the skills RHAS measures can also be applied to hysterectomies performed either laparoscopically or abdominally, as the basic steps in the procedure are identical. The study’s intent is to facilitate surgical training by tracking progress over time and to give immediate and constructive feedback to trainees.
The procedure-specific rating tool CAT-LSH was superior in terms of discriminative validity compared to the validated tool Global Operative Assessment of Laparoscopic Skills (GOALS) 22 used for laparoscopy. Goderstad et al. asserted that this is the case because CAT-LSH is more detailed for each step of the procedure compared to GOALS, a finding supported by Frederick. 20 The study, which uncovered another challenge when assessment is done by non-blinded observers, showed that the operating assistant gave a higher total score than the blinded reviewer, both in terms of GOALS and CAT-LSH in all three groups. A reasonable explanation is a cognitive bias, e.g. confirmation bias or stereotype bias.
Even though GOALS 22 is used as a comparison in the CAT-LSH study 21, the general rating scale has never been tested and validated in a gynaecologic surgical setting. Interestingly, that is also the case for the most widely used global assessment scale, OSATS. A comprehensive study by Hatala et al.13 thoroughly analysed the validity evidence for OSATS in a simulating setting, but the global rating scale must still be validated in a real-life clinical operating room in gynaecologic surgery.
Husslein et al. examined GERT 24, which was able to significantly discriminate between low and high performers by analysing errors. The study also identified procedures more prone to technical errors, which is important knowledge when determining the focus of a procedure-specific assessment tool and how detailed each procedural step should be evaluated. The study is limited by a small sample size and the fact that the videos were retained from a previous study.