ProstaTD: Bridging Surgical Triplet from Classification to Fully Supervised Detection

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

Surgical TripletEndoscopyBenchmarkDatasetEvaluation

Surgical triplet detection is a critical task in surgical video analysis, with significant implications for performance assessment and training novice surgeons. However, existing datasets like CholecT50 lack precise spatial bounding box annotations, rendering triplet classification at the image level insufficient for practical applications. The inclusion of bounding box annotations is essential to make this task meaningful, as they provide the spatial context necessary for accurate analysis and improved model generalizability. To address these shortcomings, we introduce ProstaTD, a large-scale, multi-institutional dataset for surgical triplet detection, developed from the technically demanding domain of robot-assisted prostatectomy. ProstaTD offers clinically defined temporal boundaries and high-precision bounding box annotations for each structured triplet activity. The dataset comprises 71,775 video frames and 196,490 annotated triplet instances, collected from 21 surgeries performed across multiple institutions, reflecting a broad range of surgical practices and intraoperative conditions. The annotation process was conducted under rigorous medical supervision and involved more than 60 contributors, including practicing surgeons and medically trained annotators, through multiple iterative phases of labeling and verification. To further facilitate future general-purpose surgical annotation, we developed two tailored labeling tools to improve efficiency and scalability in our annotation workflows. In addition, we created a surgical triplet detection evaluation toolkit that enables standardized and reproducible performance assessment across studies. ProstaTD is the largest and most diverse surgical triplet dataset to date, moving the field from simple classification to full detection with precise spatial and temporal boundaries and thereby providing a robust foundation for fair benchmarking.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces ProstaTD, a large-scale dataset for surgical triplet detection in robot-assisted prostatectomy, featuring 71,775 frames and 196,490 annotated triplet instances with bounding boxes across 21 multi-institutional surgeries. It resides in the 'Large-Scale Multi-Institutional Datasets with Bounding Box Annotations' leaf, which contains only one sibling paper. This represents a relatively sparse research direction within the broader taxonomy of eight total papers, suggesting the work addresses an emerging need for spatially annotated surgical triplet datasets beyond existing image-level classification resources like CholecT50.

The taxonomy reveals three main branches: Dataset Development, Methodological Approaches, and Clinical Applications. ProstaTD sits within Dataset Development, adjacent to 'Holistic Surgical Scene Understanding with Pixel-Wise Recognition' (one paper) and separate from methodological leaves addressing deep learning frameworks, disentanglement, and adversarial robustness (three papers total). The dataset's multi-institutional scope and bounding box annotations position it as infrastructure enabling the methodological innovations in neighboring branches, while its prostatectomy focus distinguishes it from broader endoscopic surgery datasets. The sparse population of its leaf suggests limited prior work specifically combining large-scale triplet detection with precise spatial annotations across institutions.

Among 29 candidates examined, the analysis identified potential overlap for all three contributions. The core dataset contribution (10 candidates examined, 1 refutable) shows the most novelty, though one prior work appears to provide similar multi-institutional triplet annotations. The annotation tools contribution (9 candidates, 1 refutable) and evaluation toolkit (10 candidates, 2 refutable) face more substantial prior work, with existing open-source labeling frameworks and benchmark protocols identified. These statistics reflect a focused semantic search rather than exhaustive coverage, indicating that within the examined scope, the dataset's scale and domain specificity appear more distinctive than its tooling and evaluation components.

Based on the limited search of 29 candidates, the work's primary novelty appears to lie in its domain-specific scale and multi-institutional scope for prostatectomy triplet detection with spatial annotations. The sparse taxonomy leaf (one sibling) and contribution-level statistics suggest the dataset addresses a genuine gap, though the annotation tools and benchmarking components encounter more established prior work. This assessment reflects top-K semantic matches and may not capture domain-specific precedents outside the search scope.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Surgical triplet detection in robot-assisted prostatectomy videos involves identifying and linking three key elements—instruments, verbs (actions), and targets (anatomical structures)—within surgical video frames. The field's structure, as reflected in the taxonomy, organizes around three main branches: Dataset Development and Annotation Infrastructure, which focuses on creating large-scale, richly annotated video corpora with bounding boxes and triplet labels; Methodological Approaches for Triplet Recognition and Detection, encompassing algorithmic innovations from deep learning architectures to disentanglement strategies; and Clinical Applications and Surgical Practice, which bridges technical advances to real-world surgical workflows and training. Representative works like ProstaTD Dataset[3] exemplify the dataset development effort by providing multi-institutional annotations, while studies such as Triplet Disentanglement[7] illustrate methodological refinements that decompose the triplet recognition problem into more tractable sub-tasks. Guardian[1] and AI Endoscopy Surgery[2] demonstrate how these technical foundations support clinical decision-making and intraoperative guidance. A particularly active line of work centers on scaling annotation quality and diversity across institutions, addressing challenges such as inter-annotator variability and the need for pixel-wise versus bounding-box granularity, as seen in Pixel-wise Surgical[4]. Methodological debates revolve around whether to treat triplet detection as a unified end-to-end problem or to disentangle instrument detection, action recognition, and target localization into separate stages. ProstaTD Bridging[0] sits within the Dataset Development and Annotation Infrastructure branch, specifically targeting large-scale multi-institutional datasets with bounding box annotations. Compared to ProstaTD Dataset[3], which established foundational annotation protocols, ProstaTD Bridging[0] appears to extend this infrastructure by addressing cross-institutional harmonization and bridging gaps in annotation consistency. This positions the work as a natural evolution in dataset maturity, complementing methodological advances like those in Deep Learning Action[6] and supporting broader clinical integration efforts exemplified by Triple-console Telesurgery[5] and Single Port Prostatectomy[8].

Claimed Contributions

ProstaTD dataset for surgical triplet detection

Can Refute

10 retrieved papers

The authors present ProstaTD, the first large-scale dataset enabling fully supervised surgical triplet detection at the procedure level. It contains 71,775 frames with 196,490 annotated triplet instances from 21 multi-institutional surgeries, featuring precise bounding boxes and clinically defined temporal boundaries for each triplet.

10 retrieved papers

Can Refute

Open-source annotation tools for surgical triplet labeling

Can Refute

9 retrieved papers

The authors developed two dedicated annotation applications (Triplet-labelme and SurgLabel) specifically designed for surgical triplet annotation. These tools support single-frame triplet editing and high-throughput batch labeling, and will be released as open source to facilitate large-scale annotation across diverse surgical procedures.

9 retrieved papers

Can Refute

Evaluation toolkit and benchmark for surgical triplet detection

Can Refute

10 retrieved papers

The authors introduce an evaluation toolkit (ivtdmetrics) tailored for surgical triplet detection benchmarking, supporting metrics such as mAP at various IoU thresholds, precision, recall, and F1-score. They also provide comprehensive benchmarks using state-of-the-art models and propose TDnet as a baseline method.

10 retrieved papers

Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[3] ProstaTD: A Large-scale Multi-source Dataset for Structured Surgical Triplet Detection PDF

Y Chen, Z Li, C Xu, AQ Liu, X Xu, JYC Teoh (2025)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

ProstaTD dataset for surgical triplet detection

[3] ProstaTD: A Large-scale Multi-source Dataset for Structured Surgical Triplet Detection PDF

Can Refute

[4] Pixel-wise recognition for holistic surgical scene understanding PDF

Cannot Refute

[6] A deep learning framework for surgery action detection PDF

Cannot Refute

[7] Surgical action triplet recognition via triplet disentanglement PDF

Cannot Refute

[9] SAR-RARP50: Segmentation of surgical instrumentation and Action Recognition on Robot-Assisted Radical Prostatectomy Challenge PDF

Cannot Refute

[10] Estimating surgical urethral length on intraoperative robot-assisted prostatectomy images using artificial intelligence anatomy recognition PDF

Cannot Refute

[11] Towards holistic surgical scene understanding PDF

Cannot Refute

[12] A Dataset and Benchmark for Robot-Assisted Radical Prostatectomy With Lymphadenectomy in Surgical Workflow Undertstanding PDF

Cannot Refute

[13] A Dataset for Robot-assisted Radical Prostatectomy with Lymphadenectomy in Surgical Workflow Undertstanding PDF

Cannot Refute

[14] TriQuery: A Query-Based Model for Surgical Triplet Recognition. PDF

Cannot Refute

Contribution

Open-source annotation tools for surgical triplet labeling

[3] ProstaTD: A Large-scale Multi-source Dataset for Structured Surgical Triplet Detection PDF

Can Refute

[16] Temset-24k: Densely annotated dataset for indexing multipart endoscopic videos using surgical timeline segmentation PDF

Cannot Refute

[17] Instrument-tissue-guided surgical action triplet detection via textual-temporal trail exploration PDF

Cannot Refute

[24] Surgical video workflow analysis via visual-language learning PDF

Cannot Refute

[25] âDeep-Ontoâ network for surgical workflow and context recognition PDF

Cannot Refute

[26] Frame Selection Methods to Streamline Surgical Video Annotation for Tool Detection Tasks PDF

Cannot Refute

[27] Grounding Surgical Action Triplets with Instrument Instance Segmentation: A Dataset and Target-Aware Fusion Approach PDF

Cannot Refute

[28] Surgical Triplet Recognition via Diffusion Model PDF

Cannot Refute

[29] Web based Object Annotation Tool using a Triplet-ReID Sorting Approach PDF

Cannot Refute

Contribution

Evaluation toolkit and benchmark for surgical triplet detection

[3] ProstaTD: A Large-scale Multi-source Dataset for Structured Surgical Triplet Detection PDF

Can Refute

[21] CholecTriplet2021: A benchmark challenge for surgical action triplet recognition PDF

Can Refute

[15] Parameter-efficient framework for surgical action triplet recognition PDF

Cannot Refute

[16] Temset-24k: Densely annotated dataset for indexing multipart endoscopic videos using surgical timeline segmentation PDF

Cannot Refute

[17] Instrument-tissue-guided surgical action triplet detection via textual-temporal trail exploration PDF

Cannot Refute

[18] Rendezvous in time: an attention-based temporal fusion approach for surgical triplet recognition PDF

Cannot Refute

[19] Surgical activity triplet recognition via triplet disentanglement PDF

Cannot Refute

[20] CurConMix: A Curriculum Contrastive Learning Framework for Enhancing Surgical Action Triplet Recognition PDF

Cannot Refute

[22] Why deep surgical models fail?: Revisiting surgical action triplet recognition through the lens of robustness PDF

Cannot Refute

[23] MT4MTL-KD: A Multi-Teacher Knowledge Distillation Framework for Triplet Recognition PDF

Cannot Refute

ProstaTD: Bridging Surgical Triplet from Classification to Fully Supervised Detection

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[3] ProstaTD: A Large-scale Multi-source Dataset for Structured Surgical Triplet Detection PDF

Contribution Analysis

ProstaTD dataset for surgical triplet detection

[3] ProstaTD: A Large-scale Multi-source Dataset for Structured Surgical Triplet Detection PDF

[4] Pixel-wise recognition for holistic surgical scene understanding PDF

[6] A deep learning framework for surgery action detection PDF

[7] Surgical action triplet recognition via triplet disentanglement PDF

[9] SAR-RARP50: Segmentation of surgical instrumentation and Action Recognition on Robot-Assisted Radical Prostatectomy Challenge PDF

[10] Estimating surgical urethral length on intraoperative robot-assisted prostatectomy images using artificial intelligence anatomy recognition PDF

[11] Towards holistic surgical scene understanding PDF

[12] A Dataset and Benchmark for Robot-Assisted Radical Prostatectomy With Lymphadenectomy in Surgical Workflow Undertstanding PDF

[13] A Dataset for Robot-assisted Radical Prostatectomy with Lymphadenectomy in Surgical Workflow Undertstanding PDF

[14] TriQuery: A Query-Based Model for Surgical Triplet Recognition. PDF

Open-source annotation tools for surgical triplet labeling

[3] ProstaTD: A Large-scale Multi-source Dataset for Structured Surgical Triplet Detection PDF

[16] Temset-24k: Densely annotated dataset for indexing multipart endoscopic videos using surgical timeline segmentation PDF

[17] Instrument-tissue-guided surgical action triplet detection via textual-temporal trail exploration PDF

[24] Surgical video workflow analysis via visual-language learning PDF

[25] âDeep-Ontoâ network for surgical workflow and context recognition PDF

[26] Frame Selection Methods to Streamline Surgical Video Annotation for Tool Detection Tasks PDF

[27] Grounding Surgical Action Triplets with Instrument Instance Segmentation: A Dataset and Target-Aware Fusion Approach PDF

[28] Surgical Triplet Recognition via Diffusion Model PDF

[29] Web based Object Annotation Tool using a Triplet-ReID Sorting Approach PDF

Evaluation toolkit and benchmark for surgical triplet detection

[3] ProstaTD: A Large-scale Multi-source Dataset for Structured Surgical Triplet Detection PDF

[21] CholecTriplet2021: A benchmark challenge for surgical action triplet recognition PDF

[15] Parameter-efficient framework for surgical action triplet recognition PDF

[16] Temset-24k: Densely annotated dataset for indexing multipart endoscopic videos using surgical timeline segmentation PDF

[17] Instrument-tissue-guided surgical action triplet detection via textual-temporal trail exploration PDF

[18] Rendezvous in time: an attention-based temporal fusion approach for surgical triplet recognition PDF

[19] Surgical activity triplet recognition via triplet disentanglement PDF

[20] CurConMix: A Curriculum Contrastive Learning Framework for Enhancing Surgical Action Triplet Recognition PDF

[22] Why deep surgical models fail?: Revisiting surgical action triplet recognition through the lens of robustness PDF

[23] MT4MTL-KD: A Multi-Teacher Knowledge Distillation Framework for Triplet Recognition PDF

Table of Contents

[25] âDeep-Ontoâ network for surgical workflow and context recognition PDF