Enhanced Trauma Video Review With Computer Vision

Trauma Resuscitation Phase Segmentation and Procedure Detection

Published on

Dec 1, 2025

Annals of Surgery Open

Villarreal, Joshua A. MD; Heo, Jaewoo BS; Wang, Xiaohan PhD; Bain, Andrew MD; Succar, Bahaa MD; Yao, Dong-han MD; Jopling, Jeffrey K. MD; Yeung-Levy, Serena PhD; Dumas, Ryan P. MD

Overview

This methodological study evaluated whether computer vision can automate key components of trauma video review (TVR), a powerful but labor-intensive quality improvement tool. Investigators analyzed 95 de-identified trauma resuscitation videos from a Level I trauma center and, with guidance from a multi-institutional TVR research group, created a standardized framework for four visually discernible phases of resuscitation: pre-arrival, paramedic handover, acute resuscitation, and pre-departure. They also defined an annotation scheme for common bedside procedures including X-rays, FAST examinations, central line placement, and IV access. An expert annotation team labeled a subset of 30 videos, achieving high interrater reliability using temporal intersection over union (tIoU), which served as the ground truth for model training and evaluation.

The authors then built a two-stage computer vision pipeline using multiview trauma bay video. Spatiotemporal features were first extracted with an Inflated 3D Convolutional Network (I3D) to capture both visual content and motion across up to four synchronized camera angles. These features were passed to two downstream models: a Multi-Stage Temporal Convolutional Network (MS-TCN) for continuous trauma phase segmentation, and ActionFormer, a transformer-based temporal action localization model, for detection and temporal localization of discrete procedures. Model performance was assessed with frame-wise accuracy, edit score, F1 scores at multiple tIoU thresholds for phase segmentation, and average precision/recall for procedure detection.

Results

The patient cohort (N=95) was predominantly male (68.4%) with a median age of 31 years and mostly blunt trauma (78.9%), and annotators demonstrated excellent agreement (mean tIoU 0.89). The phase segmentation model performed strongly, achieving 98.3% frame-wise accuracy, a 92.1% edit score, and F1 scores of 94.5%, 94.1%, and 86.3% at tIoU thresholds of 0.1, 0.25, and 0.5, respectively, with high mean tIoU across all four phases. Procedure detection showed good discrimination for X-rays and central lines, with average precision exceeding 66% at lenient thresholds, but only modest performance for FAST exams and poor performance for IV access, particularly at stricter tIoU thresholds, highlighting both the promise of this approach and the need for further optimization for more subtle or visually complex procedures.

Automating TVR processes has the potential to revolutionize trauma quality improvement by making video-based performance review and feedback more accessible, scalable, and actionable. Ultimately contributing to enhanced trauma resuscitation care quality across trauma centers.

See the Study

Peer-reviewed Research

See All Articles