Assessing ovarian cancer via laparoscopic video: inter-rater and intra-rater reliability

Published on

Dec 1, 2025

International Journal of Gynecological Cancer

Kjestine Emilie Mølle, Catrine Carlstein, Mikkel Rosendahl, Anders Tolver, Teodor Grantcharov, Jette Led Sørensen, Jeanett Strandbygaard

Overview

This study evaluated the reliability of assessing ovarian cancer disease burden using laparoscopic video recordings and the Predictive Index Value (PIV) model, a validated scoring system used to estimate the likelihood of complete cytoreductive surgery. Because most women with ovarian cancer present with advanced-stage disease, accurate initial staging is critical in determining whether patients should undergo primary debulking surgery or neoadjuvant chemotherapy. The PIV score is based on the assessment of seven intra-abdominal areas during laparoscopy, and treatment decisions rely heavily on consistent and accurate scoring by gynecologic oncologists.

To examine agreement among clinicians, the authors collected 20 laparoscopic videos (including five duplicates) recorded between October 2021 and January 2024 at a European Society of Gynecological Oncology Centre of Excellence in Denmark. Each video was edited to represent the seven anatomical areas included in the PIV model. Twenty-one participants—eight gynecologic oncologists, seven gynecologists specialized in benign conditions, and six residents (five completing all videos)—assessed the videos using the PIV. Inter-rater and intra-rater reliability were analyzed using kappa statistics and compared with real-time laparoscopic scores.

Results

Agreement with real-time laparoscopic assessments ranged from 58.4% to 81.7% across intra-abdominal areas. The probability of assigning correct scores varied widely among participants, ranging from 34% (95% CI 14.9–60.2) to 98.9% (95% CI 95.9–99.7), with gynecologic oncologists achieving the highest accuracy. Overall inter-rater agreement across all participants ranged from moderate to substantial (Light’s κ = 0.436–0.624). When examining specific intra-abdominal regions among gynecologic oncologists, agreement ranged from fair to perfect (Light’s κ = 0.181–0.829).

Intra-rater variability accounted for 49.2% of total variability, while inter-rater variability accounted for 50.8%, indicating that both individual inconsistency and differences between assessors contributed nearly equally to scoring variation. Despite specialists performing better than non-oncologists and residents, kappa values remained relatively modest overall. The findings suggest that edited video clips reduce accuracy in evaluating disease burden compared to real-time laparoscopy, and that PIV assessment should remain a task performed by trained gynecologic oncologists with specialized expertise.

See the Study

Peer-reviewed Research

See All Articles