Assessment tool Scoring Generalisation Extrapolation Strength Weakness
Objective Structured Assessment of technical Skills (OSATS). 21 Comparing OSATS scores over time Construct validity was demonstrated as a significant rise in score with increasing caseload as 1.10 OSATS point per assessed procedure (p=0.008, 95% CI 0.44–1.77) Creating learning curves to identify residents in need of more guidance No blinded assessment and self-evaluation; small sample size; high interrater variation; lack of objectivity; not adjusted for case mix
Vaginal Surgical Skills Index (VSSI). 1 8 Comparing GRS with VSSI and adding a visual analogue scale for overall performance Interrater reliability was 0.53 and intrarater reliability was 0.82 Able to discriminate training levels for VSSI scores 27 surgeons from two institutions; multiple expert reviewers; focus on case-mix Assessment items not procedure specific and can be applied to laparoscopic surgery in general
Hopkins Assessment of Surgical Competency (HASC). 19
Surgeons rated by supervisors on general surgical skills and case-specific surgical skills
Internal consistency reliability of the items using high Cronbach’s alpha = 0.80 (p<0.001)
Discriminative validity for inexperienced vs intermediate surgeons (p<0.001)
362 surgical cases were evaluated
No blinded assessment and self-evaluation; many different procedures evaluated; not adjusted for case mix
Objective Structured Assessment of Laparoscopic Salpingectomy (OSA-LS). 20 Surgeons rated by OSA-LS Interrater reliability =0.831 Discriminative validity for inexperienced vs intermediate surgeon’s vs experienced surgeons (p< 0.03) Blinded Small sample size; not adjusted for case mix
Robotic Hysterectomy Assessment Score (RHAS). 15 Surgeons rated by expert viewers using RHAS Interrater reliability for total domain score (p>0.006; p<0.001) Differences demonstrated between experts, advanced beginners and novice in all domains except vaginal cuff closure 52 blinded video recording; multiple expert reviewers Confounding variable when assessing novice surgeons is the presence of an attending physician providing direct feedback; not adjusted for case mix
Competence Assessment for Laparoscopic Supracervical Hysterectomy (CAT-LSH). 16
Comparing GOALS and CAT-LSH
Interrater reliability = 0.75
Discriminative validity for inexperienced vs intermediate (p<0.006 and intermediate vs experts (p<0.001)
Video recording and blinded expert reviewers
Small sample size; not adjusted for case mix
Feasible rating scale for formative and summative feedback. 23
Surgeons rated by expert viewers using 12-item procedure-specific checklist
Internal consistency reliability of the items Cronbach’s alpha =0.95 (p<0.001) Interrater reliability =0.996 for one rater and 0.0998 for two raters Discriminative validity for beginners and experienced surgeons (p=<0.001)
Video recording and blinded expert reviewers
Small sample size; not adjusted for case mix
GERT = Generic Error Rating Tool. 24
OSATS scores used to establish and measure technical skills, to group surgeons as high or low performers and to correlate scores with GERT in an inverse relationship (more skilled surgeons make fewer errors)
Interrater reliability high (>0.95) Intrarater reliability significant (>0.95)
Significant negative correlation between OSATS and GERT scores
Video recording and blinded expert reviewers; analysis of operative substeps more prone to technical errors; captures near misses (events that may result in injury but did not, either by chance or timely intervention)
Although interrater reliability was high, not every error was rated identically by the two reviewers; not adjusted for case mix