ProstaTD: Bridging Surgical Triplet from Classification to Fully Supervised Detection
Overview
Overall Novelty Assessment
The paper introduces ProstaTD, a large-scale dataset for surgical triplet detection in robot-assisted prostatectomy, featuring 71,775 frames and 196,490 annotated triplet instances with bounding boxes across 21 multi-institutional surgeries. It resides in the 'Large-Scale Multi-Institutional Datasets with Bounding Box Annotations' leaf, which contains only one sibling paper. This represents a relatively sparse research direction within the broader taxonomy of eight total papers, suggesting the work addresses an emerging need for spatially annotated surgical triplet datasets beyond existing image-level classification resources like CholecT50.
The taxonomy reveals three main branches: Dataset Development, Methodological Approaches, and Clinical Applications. ProstaTD sits within Dataset Development, adjacent to 'Holistic Surgical Scene Understanding with Pixel-Wise Recognition' (one paper) and separate from methodological leaves addressing deep learning frameworks, disentanglement, and adversarial robustness (three papers total). The dataset's multi-institutional scope and bounding box annotations position it as infrastructure enabling the methodological innovations in neighboring branches, while its prostatectomy focus distinguishes it from broader endoscopic surgery datasets. The sparse population of its leaf suggests limited prior work specifically combining large-scale triplet detection with precise spatial annotations across institutions.
Among 29 candidates examined, the analysis identified potential overlap for all three contributions. The core dataset contribution (10 candidates examined, 1 refutable) shows the most novelty, though one prior work appears to provide similar multi-institutional triplet annotations. The annotation tools contribution (9 candidates, 1 refutable) and evaluation toolkit (10 candidates, 2 refutable) face more substantial prior work, with existing open-source labeling frameworks and benchmark protocols identified. These statistics reflect a focused semantic search rather than exhaustive coverage, indicating that within the examined scope, the dataset's scale and domain specificity appear more distinctive than its tooling and evaluation components.
Based on the limited search of 29 candidates, the work's primary novelty appears to lie in its domain-specific scale and multi-institutional scope for prostatectomy triplet detection with spatial annotations. The sparse taxonomy leaf (one sibling) and contribution-level statistics suggest the dataset addresses a genuine gap, though the annotation tools and benchmarking components encounter more established prior work. This assessment reflects top-K semantic matches and may not capture domain-specific precedents outside the search scope.
Taxonomy
Research Landscape Overview
Claimed Contributions
The authors present ProstaTD, the first large-scale dataset enabling fully supervised surgical triplet detection at the procedure level. It contains 71,775 frames with 196,490 annotated triplet instances from 21 multi-institutional surgeries, featuring precise bounding boxes and clinically defined temporal boundaries for each triplet.
The authors developed two dedicated annotation applications (Triplet-labelme and SurgLabel) specifically designed for surgical triplet annotation. These tools support single-frame triplet editing and high-throughput batch labeling, and will be released as open source to facilitate large-scale annotation across diverse surgical procedures.
The authors introduce an evaluation toolkit (ivtdmetrics) tailored for surgical triplet detection benchmarking, supporting metrics such as mAP at various IoU thresholds, precision, recall, and F1-score. They also provide comprehensive benchmarks using state-of-the-art models and propose TDnet as a baseline method.
Core Task Comparisons
Comparisons with papers in the same taxonomy category
[3] ProstaTD: A Large-scale Multi-source Dataset for Structured Surgical Triplet Detection PDF
Contribution Analysis
Detailed comparisons for each claimed contribution
ProstaTD dataset for surgical triplet detection
The authors present ProstaTD, the first large-scale dataset enabling fully supervised surgical triplet detection at the procedure level. It contains 71,775 frames with 196,490 annotated triplet instances from 21 multi-institutional surgeries, featuring precise bounding boxes and clinically defined temporal boundaries for each triplet.
[3] ProstaTD: A Large-scale Multi-source Dataset for Structured Surgical Triplet Detection PDF
[4] Pixel-wise recognition for holistic surgical scene understanding PDF
[6] A deep learning framework for surgery action detection PDF
[7] Surgical action triplet recognition via triplet disentanglement PDF
[9] SAR-RARP50: Segmentation of surgical instrumentation and Action Recognition on Robot-Assisted Radical Prostatectomy Challenge PDF
[10] Estimating surgical urethral length on intraoperative robot-assisted prostatectomy images using artificial intelligence anatomy recognition PDF
[11] Towards holistic surgical scene understanding PDF
[12] A Dataset and Benchmark for Robot-Assisted Radical Prostatectomy With Lymphadenectomy in Surgical Workflow Undertstanding PDF
[13] A Dataset for Robot-assisted Radical Prostatectomy with Lymphadenectomy in Surgical Workflow Undertstanding PDF
[14] TriQuery: A Query-Based Model for Surgical Triplet Recognition. PDF
Open-source annotation tools for surgical triplet labeling
The authors developed two dedicated annotation applications (Triplet-labelme and SurgLabel) specifically designed for surgical triplet annotation. These tools support single-frame triplet editing and high-throughput batch labeling, and will be released as open source to facilitate large-scale annotation across diverse surgical procedures.
[3] ProstaTD: A Large-scale Multi-source Dataset for Structured Surgical Triplet Detection PDF
[16] Temset-24k: Densely annotated dataset for indexing multipart endoscopic videos using surgical timeline segmentation PDF
[17] Instrument-tissue-guided surgical action triplet detection via textual-temporal trail exploration PDF
[24] Surgical video workflow analysis via visual-language learning PDF
[25] âDeep-Ontoâ network for surgical workflow and context recognition PDF
[26] Frame Selection Methods to Streamline Surgical Video Annotation for Tool Detection Tasks PDF
[27] Grounding Surgical Action Triplets with Instrument Instance Segmentation: A Dataset and Target-Aware Fusion Approach PDF
[28] Surgical Triplet Recognition via Diffusion Model PDF
[29] Web based Object Annotation Tool using a Triplet-ReID Sorting Approach PDF
Evaluation toolkit and benchmark for surgical triplet detection
The authors introduce an evaluation toolkit (ivtdmetrics) tailored for surgical triplet detection benchmarking, supporting metrics such as mAP at various IoU thresholds, precision, recall, and F1-score. They also provide comprehensive benchmarks using state-of-the-art models and propose TDnet as a baseline method.