Strengths and limitations
Even though the development of content validity for procedure-specific
assessment tools requires using Delphi methodology, which is a
consensus-based approach,30 of the eight studies in
our scoping review, only Frederick et al. did so.20They discussed the potentially confounding variable of the attending
physician providing direct supervision and guidance when evaluating a
novice surgeon. This may account for why novice surgeons’ RHAS20 scores did not differ more relative to their more
experienced colleagues. Case mix may also explain this lack of
difference in scores.
RHAS 20 demonstrated both construct and discriminative
validity and appears to be feasible. It is argued that many of the
skills RHAS measures can also be applied to hysterectomies performed
either laparoscopically or abdominally, as the basic steps in the
procedure are identical. The study’s intent is to facilitate surgical
training by tracking progress over time and to give immediate and
constructive feedback to trainees.
The procedure-specific rating tool CAT-LSH was superior in terms of
discriminative validity compared to the validated tool Global Operative
Assessment of Laparoscopic Skills (GOALS) 22 used for
laparoscopy. Goderstad et al. asserted that this is the case because
CAT-LSH is more detailed for each step of the procedure compared to
GOALS, a finding supported by Frederick. 20 The study,
which uncovered another challenge when assessment is done by non-blinded
observers, showed that the operating assistant gave a higher total score
than the blinded reviewer, both in terms of GOALS and CAT-LSH in all
three groups. A reasonable explanation is a cognitive bias, e.g.
confirmation bias or stereotype bias.
Even though GOALS 22 is used as a comparison in the
CAT-LSH study 21, the general rating scale has never
been tested and validated in a gynaecologic surgical setting.
Interestingly, that is also the case for the most widely used global
assessment scale, OSATS. A comprehensive study by Hatala et
al.13 thoroughly analysed the validity evidence for
OSATS in a simulating setting, but the global rating scale must still be
validated in a real-life clinical operating room in gynaecologic
surgery.
Husslein et al. examined GERT 24, which was able to
significantly discriminate between low and high performers by analysing
errors. The study also identified procedures more prone to technical
errors, which is important knowledge when determining the focus of a
procedure-specific assessment tool and how detailed each procedural step
should be evaluated. The study is limited by a small sample size and the
fact that the videos were retained from a previous study.