Crowd-Sourced Assessment of Technical Skills for Validation of Basic Laparoscopic Urologic Skills Tasks

Timothy M. Kowalewski, Bryan Comstock, Robert Sweet, Cory Schaffhausen, Ashleigh Menhadji, Timothy Averch, Geoffrey Box, Timothy Brand, Michael Ferrandino, Jihad Kaouk, Bodo Knudsen, Jaime Landman, Benjamin Lee, Bradley F. Schwartz, Elspeth McDougall, Thomas S. Lendvay

Research output: Contribution to journalArticle

23 Citations (Scopus)

Abstract

Purpose The BLUS (Basic Laparoscopic Urologic Skills) consortium sought to address the construct validity of BLUS tasks and the wider problem of accurate, scalable and affordable skill evaluation by investigating the concordance of 2 novel candidate methods with faculty panel scores, those of automated motion metrics and crowdsourcing. Materials and Methods A faculty panel of surgeons (5) and anonymous crowdworkers blindly reviewed a randomized sequence of a representative sample of 24 videos (12 pegboard and 12 suturing) extracted from the BLUS validation study (454) using the GOALS (Global Objective Assessment of Laparoscopic Skills) survey tool with appended pass-fail anchors via the same web based user interface. Pre-recorded motion metrics (tool path length, jerk cost etc) were available for each video. Cronbach's alpha, Pearson's R and ROC with AUC statistics were used to evaluate concordance between continuous scores, and as pass-fail criteria among the 3 groups of faculty, crowds and motion metrics. Results Crowdworkers provided 1,840 ratings in approximately 48 hours, 60 times faster than the faculty panel. The inter-rater reliability of mean expert and crowd ratings was good (α=0.826). Crowd score derived pass-fail resulted in 96.9% AUC (95% CI 90.3-100; positive predictive value 100%, negative predictive value 89%). Motion metrics and crowd scores provided similar or nearly identical concordance with faculty panel ratings and pass-fail decisions. Conclusions The concordance of crowdsourcing with faculty panels and speed of reviews is sufficiently high to merit its further investigation alongside automated motion metrics. The overall agreement among faculty, motion metrics and crowdworkers provides evidence in support of the construct validity for 2 of the 4 BLUS tasks.

Original languageEnglish (US)
Pages (from-to)1859-1865
Number of pages7
JournalJournal of Urology
Volume195
Issue number6
DOIs
StatePublished - Jun 1 2016
Externally publishedYes

Fingerprint

Crowdsourcing
Area Under Curve
Validation Studies
Costs and Cost Analysis

Keywords

  • clinical competence
  • crowdsourcing
  • laparoscopy
  • urologic surgical procedures
  • validation studies

ASJC Scopus subject areas

  • Urology

Cite this

Kowalewski, T. M., Comstock, B., Sweet, R., Schaffhausen, C., Menhadji, A., Averch, T., ... Lendvay, T. S. (2016). Crowd-Sourced Assessment of Technical Skills for Validation of Basic Laparoscopic Urologic Skills Tasks. Journal of Urology, 195(6), 1859-1865. https://doi.org/10.1016/j.juro.2016.01.005

Crowd-Sourced Assessment of Technical Skills for Validation of Basic Laparoscopic Urologic Skills Tasks. / Kowalewski, Timothy M.; Comstock, Bryan; Sweet, Robert; Schaffhausen, Cory; Menhadji, Ashleigh; Averch, Timothy; Box, Geoffrey; Brand, Timothy; Ferrandino, Michael; Kaouk, Jihad; Knudsen, Bodo; Landman, Jaime; Lee, Benjamin; Schwartz, Bradley F.; McDougall, Elspeth; Lendvay, Thomas S.

In: Journal of Urology, Vol. 195, No. 6, 01.06.2016, p. 1859-1865.

Research output: Contribution to journalArticle

Kowalewski, TM, Comstock, B, Sweet, R, Schaffhausen, C, Menhadji, A, Averch, T, Box, G, Brand, T, Ferrandino, M, Kaouk, J, Knudsen, B, Landman, J, Lee, B, Schwartz, BF, McDougall, E & Lendvay, TS 2016, 'Crowd-Sourced Assessment of Technical Skills for Validation of Basic Laparoscopic Urologic Skills Tasks', Journal of Urology, vol. 195, no. 6, pp. 1859-1865. https://doi.org/10.1016/j.juro.2016.01.005
Kowalewski TM, Comstock B, Sweet R, Schaffhausen C, Menhadji A, Averch T et al. Crowd-Sourced Assessment of Technical Skills for Validation of Basic Laparoscopic Urologic Skills Tasks. Journal of Urology. 2016 Jun 1;195(6):1859-1865. https://doi.org/10.1016/j.juro.2016.01.005
Kowalewski, Timothy M. ; Comstock, Bryan ; Sweet, Robert ; Schaffhausen, Cory ; Menhadji, Ashleigh ; Averch, Timothy ; Box, Geoffrey ; Brand, Timothy ; Ferrandino, Michael ; Kaouk, Jihad ; Knudsen, Bodo ; Landman, Jaime ; Lee, Benjamin ; Schwartz, Bradley F. ; McDougall, Elspeth ; Lendvay, Thomas S. / Crowd-Sourced Assessment of Technical Skills for Validation of Basic Laparoscopic Urologic Skills Tasks. In: Journal of Urology. 2016 ; Vol. 195, No. 6. pp. 1859-1865.
@article{cac566af3d744414aba0cb6e6cb36177,
title = "Crowd-Sourced Assessment of Technical Skills for Validation of Basic Laparoscopic Urologic Skills Tasks",
abstract = "Purpose The BLUS (Basic Laparoscopic Urologic Skills) consortium sought to address the construct validity of BLUS tasks and the wider problem of accurate, scalable and affordable skill evaluation by investigating the concordance of 2 novel candidate methods with faculty panel scores, those of automated motion metrics and crowdsourcing. Materials and Methods A faculty panel of surgeons (5) and anonymous crowdworkers blindly reviewed a randomized sequence of a representative sample of 24 videos (12 pegboard and 12 suturing) extracted from the BLUS validation study (454) using the GOALS (Global Objective Assessment of Laparoscopic Skills) survey tool with appended pass-fail anchors via the same web based user interface. Pre-recorded motion metrics (tool path length, jerk cost etc) were available for each video. Cronbach's alpha, Pearson's R and ROC with AUC statistics were used to evaluate concordance between continuous scores, and as pass-fail criteria among the 3 groups of faculty, crowds and motion metrics. Results Crowdworkers provided 1,840 ratings in approximately 48 hours, 60 times faster than the faculty panel. The inter-rater reliability of mean expert and crowd ratings was good (α=0.826). Crowd score derived pass-fail resulted in 96.9{\%} AUC (95{\%} CI 90.3-100; positive predictive value 100{\%}, negative predictive value 89{\%}). Motion metrics and crowd scores provided similar or nearly identical concordance with faculty panel ratings and pass-fail decisions. Conclusions The concordance of crowdsourcing with faculty panels and speed of reviews is sufficiently high to merit its further investigation alongside automated motion metrics. The overall agreement among faculty, motion metrics and crowdworkers provides evidence in support of the construct validity for 2 of the 4 BLUS tasks.",
keywords = "clinical competence, crowdsourcing, laparoscopy, urologic surgical procedures, validation studies",
author = "Kowalewski, {Timothy M.} and Bryan Comstock and Robert Sweet and Cory Schaffhausen and Ashleigh Menhadji and Timothy Averch and Geoffrey Box and Timothy Brand and Michael Ferrandino and Jihad Kaouk and Bodo Knudsen and Jaime Landman and Benjamin Lee and Schwartz, {Bradley F.} and Elspeth McDougall and Lendvay, {Thomas S.}",
year = "2016",
month = "6",
day = "1",
doi = "10.1016/j.juro.2016.01.005",
language = "English (US)",
volume = "195",
pages = "1859--1865",
journal = "Journal of Urology",
issn = "0022-5347",
publisher = "Elsevier Inc.",
number = "6",

}

TY - JOUR

T1 - Crowd-Sourced Assessment of Technical Skills for Validation of Basic Laparoscopic Urologic Skills Tasks

AU - Kowalewski, Timothy M.

AU - Comstock, Bryan

AU - Sweet, Robert

AU - Schaffhausen, Cory

AU - Menhadji, Ashleigh

AU - Averch, Timothy

AU - Box, Geoffrey

AU - Brand, Timothy

AU - Ferrandino, Michael

AU - Kaouk, Jihad

AU - Knudsen, Bodo

AU - Landman, Jaime

AU - Lee, Benjamin

AU - Schwartz, Bradley F.

AU - McDougall, Elspeth

AU - Lendvay, Thomas S.

PY - 2016/6/1

Y1 - 2016/6/1

N2 - Purpose The BLUS (Basic Laparoscopic Urologic Skills) consortium sought to address the construct validity of BLUS tasks and the wider problem of accurate, scalable and affordable skill evaluation by investigating the concordance of 2 novel candidate methods with faculty panel scores, those of automated motion metrics and crowdsourcing. Materials and Methods A faculty panel of surgeons (5) and anonymous crowdworkers blindly reviewed a randomized sequence of a representative sample of 24 videos (12 pegboard and 12 suturing) extracted from the BLUS validation study (454) using the GOALS (Global Objective Assessment of Laparoscopic Skills) survey tool with appended pass-fail anchors via the same web based user interface. Pre-recorded motion metrics (tool path length, jerk cost etc) were available for each video. Cronbach's alpha, Pearson's R and ROC with AUC statistics were used to evaluate concordance between continuous scores, and as pass-fail criteria among the 3 groups of faculty, crowds and motion metrics. Results Crowdworkers provided 1,840 ratings in approximately 48 hours, 60 times faster than the faculty panel. The inter-rater reliability of mean expert and crowd ratings was good (α=0.826). Crowd score derived pass-fail resulted in 96.9% AUC (95% CI 90.3-100; positive predictive value 100%, negative predictive value 89%). Motion metrics and crowd scores provided similar or nearly identical concordance with faculty panel ratings and pass-fail decisions. Conclusions The concordance of crowdsourcing with faculty panels and speed of reviews is sufficiently high to merit its further investigation alongside automated motion metrics. The overall agreement among faculty, motion metrics and crowdworkers provides evidence in support of the construct validity for 2 of the 4 BLUS tasks.

AB - Purpose The BLUS (Basic Laparoscopic Urologic Skills) consortium sought to address the construct validity of BLUS tasks and the wider problem of accurate, scalable and affordable skill evaluation by investigating the concordance of 2 novel candidate methods with faculty panel scores, those of automated motion metrics and crowdsourcing. Materials and Methods A faculty panel of surgeons (5) and anonymous crowdworkers blindly reviewed a randomized sequence of a representative sample of 24 videos (12 pegboard and 12 suturing) extracted from the BLUS validation study (454) using the GOALS (Global Objective Assessment of Laparoscopic Skills) survey tool with appended pass-fail anchors via the same web based user interface. Pre-recorded motion metrics (tool path length, jerk cost etc) were available for each video. Cronbach's alpha, Pearson's R and ROC with AUC statistics were used to evaluate concordance between continuous scores, and as pass-fail criteria among the 3 groups of faculty, crowds and motion metrics. Results Crowdworkers provided 1,840 ratings in approximately 48 hours, 60 times faster than the faculty panel. The inter-rater reliability of mean expert and crowd ratings was good (α=0.826). Crowd score derived pass-fail resulted in 96.9% AUC (95% CI 90.3-100; positive predictive value 100%, negative predictive value 89%). Motion metrics and crowd scores provided similar or nearly identical concordance with faculty panel ratings and pass-fail decisions. Conclusions The concordance of crowdsourcing with faculty panels and speed of reviews is sufficiently high to merit its further investigation alongside automated motion metrics. The overall agreement among faculty, motion metrics and crowdworkers provides evidence in support of the construct validity for 2 of the 4 BLUS tasks.

KW - clinical competence

KW - crowdsourcing

KW - laparoscopy

KW - urologic surgical procedures

KW - validation studies

UR - http://www.scopus.com/inward/record.url?scp=84963974257&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84963974257&partnerID=8YFLogxK

U2 - 10.1016/j.juro.2016.01.005

DO - 10.1016/j.juro.2016.01.005

M3 - Article

C2 - 26778711

AN - SCOPUS:84963974257

VL - 195

SP - 1859

EP - 1865

JO - Journal of Urology

JF - Journal of Urology

SN - 0022-5347

IS - 6

ER -