1/ Background
Ovarian cancer staging is essential for determining prognosis, planning surgery, and predicting the feasibility of optimal cytoreduction. While CT and MRI are commonly used, expert transvaginal and transabdominal ultrasound has become a powerful diagnostic tool, capable of identifying key predictive signs of tumor spread. Previous studies have demonstrated that expert sonographers can achieve diagnostic accuracy comparable to CT or MRI, especially for pelvic structures.
However, it remains unclear how well non-experts or less experienced clinicians can interpret complex ultrasound findings, particularly when evaluating extra-pelvic disease such as omental caking, diaphragmatic involvement, or peritoneal carcinomatosis. Additionally, ultrasound is traditionally considered operator-dependent, meaning image acquisition and interpretation are tightly linked to examiner expertise.
2/ Objectives
The main objectives were:
- To evaluate the diagnostic performance of ultrasound examiners with varying levels of experience in identifying sites of ovarian cancer spread using standardized videoclips.
- To assess the agreement among examiners (interobserver variability).
- To determine which factors (image quality, diagnostic confidence, examiner experience) influence performance.
- To analyze how accuracy varies across different anatomical locations, from pelvis to upper abdomen.
Ultimately, the authors aimed to determine whether interpretation of ultrasound signs of cancer spread is truly operator-dependent and how training or technological support might improve performance.
3/ Methods
3.1 Study design
This was a prospective diagnostic accuracy study embedded within the ISAAC project (International Study of Advanced Ovarian Cancer by Ultrasound). The design involved:
- Acquisition of systematic ultrasound videoclips from patients with known or suspected advanced ovarian cancer.
- Presentation of these videoclips to a panel of ultrasound examiners for independent review.
3.2 Video acquisition and dataset
Expert sonographers from high-volume oncologic units performed standardized abdominal and pelvic ultrasound examinations, following a consistent scanning protocol.
- 380 videoclips were selected, each representing a specific anatomical site.
- Clips came from patients with a high prevalence of ovarian cancer spread, ensuring examiners were frequently confronted with clinically significant findings.
- Anatomical sites included:
- Pelvic organs (ovaries, uterus, pelvic peritoneum)
- Omentum
- Mesentery
- Diaphragm
- Liver surface and subphrenic area
- Other upper abdominal structures
Each videoclip was linked to a binary ground-truth: infiltration present or absent.
3.3 Examiners
- 25 ultrasound examiners participated.
- Experience levels varied:
- Highly experienced gynecologic sonographers
- Moderately experienced clinicians
- Less experienced practitioners
Examiners were blinded to clinical information, imaging reports, and each other’s evaluations.
3.4 Data collection
For each videoclip, examiners were asked to:
- Indicate whether the site was infiltrated by ovarian cancer.
- Rate image quality (poor, acceptable, good).
- Rate diagnostic confidence.
Performance metrics included:
- Sensitivity
- Specificity
- Overall accuracy
- Cohen’s kappa for interobserver agreement
Mixed-effects models assessed factors influencing diagnostic performance.
4/ Results
4.1 Overall diagnostic accuracy
Contrary to expectations, diagnostic accuracy was very high across examiners.
- Median correct classification for most anatomical sites ranged from 90% to 100%.
- Even less experienced examiners performed surprisingly well when reviewing high-quality, expert-acquired videoclips.
This suggests that when images are optimally obtained, interpretation of major signs of cancer spread is more robust than previously assumed.
4.2 Variation by anatomical site
Accuracy was highest in pelvic organs and progressively declined as the anatomical site moved upward:
- Pelvis: near-perfect performance
- Lower abdominal regions: moderately high
- Upper abdomen (diaphragm, liver surface): lowest accuracy
This mirrors the inherent complexity of upper abdominal imaging, where structures are deeper, partially obscured by bowel gas, and require more technical skill for optimal acquisition.
Even in upper abdominal sites, however, accuracy remained reasonable, though with greater interobserver variability.
4.3 Influence of examiner experience
Surprisingly, examiner experience did not significantly affect accuracy in most regions. Less experienced clinicians performed comparably to experts, likely because:
- Videoclips were acquired under optimal conditions.
- Lesions selected were often large or pronounced.
- Interpretation was simplified by standardized video quality.
This suggests that part of ultrasound’s perceived operator dependence may relate more to image acquisition than to image interpretation.
4.4 Image quality and diagnostic confidence
The strongest predictors of correct interpretation were:
- High image quality
- High diagnostic confidence
Both correlated strongly with accuracy.
When examiners rated a videoclip as poor quality or were unsure, accuracy dropped substantially.
This finding emphasizes:
Improving acquisition techniques and standardizing videoclips may significantly enhance diagnostic consistency across clinicians.
4.5 Interobserver agreement
Interobserver reliability (Cohen’s kappa):
- Substantial to almost perfect in pelvic and mid-abdominal sites
- Moderate to substantial in upper abdominal sites
Exact values ranged from 0.68 to 0.99, indicating excellent agreement overall.
This is impressive given the diversity of examiners and complexity of structures evaluated.
5/ Interpretation
The study overturns the assumption that ultrasound interpretation of ovarian cancer spread is highly examiner-dependent. Instead, once high-quality images are obtained through standardized protocols, the interpretation of spread patterns is remarkably consistent—even when examiners have different experience levels.
However, the lower performance in the upper abdomen highlights a true limitation and supports continued reliance on CT/MRI for full staging unless the ultrasound operator is extremely skilled.
The authors caution that the study’s performance likely represents an optimistic upper bound, because:
- Videoclips were selected from known cancer cases.
- Images were acquired by experts under ideal conditions.
- Real-world scans may be lower quality or ambiguous.
Still, the findings demonstrate the potential for remote review, training libraries, and possibly AI-assisted interpretation, especially in low-resource settings.
6/ Strengths and Limitations
Strengths
- Large dataset (380 videoclips) covering multiple anatomical sites.
- Standardization reduces confounding by acquisition skills.
- Inclusion of examiners with varied expertise.
- Rigorous statistical methodology.
Limitations
- High pre-test probability (many positive cases), which may inflate accuracy.
- Artificial environment: real-time scanning involves dynamic manipulation and more noise.
- Upper abdominal imaging in real patients is more challenging than depicted.
- Teaching and Clinical Implications
- Ultrasound accuracy for assessing ovarian cancer spread can be very high when images are well acquired.
- Training should prioritize acquisition skills, especially for upper abdominal areas.
- Interpretation itself may be less dependent on experience than previously thought.
- These data support the idea of AI models or centralized expert review for complex cases.
- Although promising, ultrasound staging cannot fully replace CT/MRI for assessing upper abdominal involvement.