Developing a video-based method to compare and adjust examiner effects in fully nested OSCEs

Yeates, Peter; Cope, Natalie; Hawarden, Ashley; Bradshaw, Hannah; McCray, Gareth; Homer, Matt

doi:10.1111/medu.13783

Developing a video-based method to compare and adjust examiner effects in fully nested OSCEs

Yeates, Peter; Cope, Natalie; Hawarden, Ashley; Bradshaw, Hannah; McCray, Gareth; Homer, Matt

Authors

Peter Yeates p.yeates@keele.ac.uk

Natalie Cope n.a.cope@keele.ac.uk

Ashley Hawarden a.hawarden@keele.ac.uk

Hannah Bradshaw

Gareth McCray g.mccray@keele.ac.uk

Matt Homer

Abstract

Background:
Whilst averaging across multiple examiners judgements reduces unwanted overall score variability in Objective Structured Clinical Examinations (OSCE), designs involving several parallel circuits of the OSCE require that different examiner-cohorts collectively judge performances to the same standard in order to avoid bias. Prior research suggests the potential for important examiner-cohort effects in distributed or national exams which could compromise fairness or patient safety, but despite their importance, these effects are rarely investigated as fully nested assessment designs make them very difficult to study. We describe initial use of a new method to measure and adjust for examiner-cohort effects on students’ scores.

Methods:
We developed Video-based Examiner Score Comparison and Adjustment (VESCA): volunteer students were filmed “live” on 10 out 12 OSCE stations. Following examination, examiners additionally scored station-specific common-comparator videos, producing partial crossing between examiner-cohorts. Many-Facet Rasch Modelling and Linear Mixed Modelling were used to estimate and adjust for examiner-cohort effects on students’ scores.

Results:
After accounting for students’ ability, examiner-cohorts differed substantially in their stringency/leniency (maximal global score difference of 0.47 out of 7.0 (Cohen’s d=0.96); maximal total percentage score difference of 5.7% (Cohen’s d=1.06) for the same student-ability by different examiner-cohorts). Corresponding adjustment of students’ global and total percentage scores altered the theoretical classification of 6.0% of students for both measures (either pass to fail or fail to pass), whilst 8.6-9.5% students’ scores were altered by at least 0.5 standard deviations of student ability.

Conclusions:
Despite typical reliability, the examiner-cohort which students encountered had a potentially important influence on their score, emphasising the need for adequate sampling and examiner training. Development and validation of VESCA may offer a means to measure and/or adjust for potential systematic differences in scoring patterns with could exist between locations in distributed or national OSCE exams, thereby ensuring equivalence and fairness.

Journal Article Type	Article
Acceptance Date	Nov 7, 2018
Online Publication Date	Dec 21, 2018
Publication Date	2019-03
Publicly Available Date	May 26, 2023
Journal	Medical Education
Print ISSN	0308-0110
Publisher	Wiley
Peer Reviewed	Peer Reviewed
Volume	53
Issue	3
Pages	250-263
DOI	https://doi.org/10.1111/medu.13783
Keywords	assessment, OSCEs, assessor variability, psychometrics
Publisher URL	https://doi.org/10.1111/medu.13783