VHSMarker and the Canine Cardiac Keypoint (CCK) Dataset: A Benchmark for Veterinary Cardiac X-ray Analysis

ICLR 2026 Conference SubmissionAnonymous Authors
Veterinary AICanine CardiologyKeypoint DetectionBenchmark DatasetsState-Space Models
Abstract:

We present VHSMarker, a web-based annotation tool that enables rapid and standardized labeling of six cardiac key points in canine thoracic radiographs. VHSMarker reduces annotation time to 10–12 seconds per image while supporting real-time vertebral heart score (VHS) calculation, model-assisted prediction, and quality control. Using this tool, we constructed the Canine Cardiac Key Point (CCK) Dataset, a large-scale benchmark of 21,465 annotated radiographs from 12,385 dogs across 144 breeds and additional mixed breed cases, making it the largest curated resource for canine cardiac analysis to date. To demonstrate the utility of this dataset, we introduce MambaVHS, a baseline model that integrates Mamba blocks for long-range sequence modeling with convolutional layers for local spatial precision. MambaVHS achieves 91.8% test accuracy, surpassing 13 strong baselines including ConvNeXt and EfficientNetB7, and establishes state-space modeling as a promising direction for veterinary imaging. Together, the tool, dataset, and baseline model provide the first reproducible benchmark for automated VHS estimation and a foundation for future research in veterinary cardiology. The source code and dataset are available on our project website: https://anonymousgenai.github.io/vhsmarker.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces three contributions: VHSMarker, a web-based annotation tool for cardiac keypoint labeling; the Canine Cardiac Keypoint (CCK) Dataset comprising 21,465 annotated radiographs; and MambaVHS, a baseline model integrating Mamba state-space blocks with convolutional layers. Within the taxonomy, the work resides in the 'Hybrid and Transformer-Enhanced Keypoint Models' leaf under 'Deep Learning Architectures for Keypoint Detection'. This leaf contains only three papers total, indicating a relatively sparse research direction compared to the more crowded 'EfficientNet-Based Keypoint Localization' leaf with eight papers and the 'Regression-Based VHS Prediction' leaf with eight papers.

The taxonomy reveals that most prior work clusters in two neighboring branches: pure CNN architectures (EfficientNet, ResNet, HRNet, ConvNeXt) for keypoint detection, and direct VHS regression or classification methods that bypass explicit landmark localization. The paper's hybrid approach—combining state-space modeling with convolutional layers—sits at the intersection of these directions. Nearby leaves include 'Specialized CNN Architectures for Landmark Detection' and 'Object Detection Frameworks for VHS Keypoints', both of which rely on standard convolutional backbones without transformer or state-space components. The taxonomy's scope notes clarify that this leaf specifically covers architectures integrating attention mechanisms or state-space models, distinguishing it from pure CNN approaches.

Among the 17 candidates examined, the annotation tool contribution (VHSMarker) showed no refutable prior work in the single candidate reviewed. The dataset contribution (CCK Dataset) examined nine candidates, with two appearing to provide overlapping prior work—suggesting that large-scale annotated canine cardiac datasets may exist in the limited search scope. The MambaVHS model contribution examined seven candidates with no refutations, indicating that state-space modeling for VHS estimation appears less explored among the top semantic matches. The limited search scale (17 total candidates) means these findings reflect a focused subset of the literature rather than exhaustive coverage.

Based on the top-17 semantic matches and taxonomy structure, the work appears to occupy a less crowded methodological niche (hybrid state-space models) while addressing a task with established CNN and regression baselines. The dataset contribution shows some overlap with prior resources among examined candidates, though the annotation tool and model architecture appear more distinctive within the limited search scope. The analysis does not cover broader veterinary imaging literature or recent preprints outside the candidate pool.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
17
Contribution Candidate Papers Compared
2
Refutable Paper

Research Landscape Overview

Core task: Automated vertebral heart score estimation in canine radiographs. The field centers on replacing manual VHS measurement—a time-consuming clinical procedure—with computational methods that detect anatomical landmarks and compute cardiac dimensions relative to vertebral length. The taxonomy reveals several complementary branches: Deep Learning Architectures for Keypoint Detection employ CNNs and transformers to localize cardiac and vertebral points directly; Segmentation and Volumetric Approaches delineate heart boundaries for area or volume-based metrics; Direct VHS Regression and Classification predict scores end-to-end without explicit landmark detection; Data Augmentation and Quality Enhancement address limited training data through synthetic generation and image preprocessing; Clinical Validation and Comparative Studies assess agreement with expert radiologists and commercial tools like MetronMind Validation[13]; Manual VHS Methodology and Breed-Specific Reference Values document traditional protocols and population norms across breeds such as Norwich Terrier VHS[9] and Turkish Kangal VHS[19]; and Educational Resources and Datasets provide annotated radiographs like Thoracic Educational Dataset[26] to support model development. Recent work has concentrated on keypoint-based architectures, balancing accuracy and interpretability. Many studies adopt EfficientNet backbones for feature extraction, while a smaller number explore hybrid designs that integrate transformer attention mechanisms to capture long-range spatial dependencies. VHSMarker CCK Dataset[0] falls within this Hybrid and Transformer-Enhanced Keypoint Models cluster, combining convolutional and transformer components to improve landmark localization precision. This approach contrasts with purely convolutional methods like Precision VHS EfficientNet[2] and HRNet VHS Prediction[5], which rely on hierarchical feature pyramids, and with direct regression frameworks such as VHS Cardiomegaly Deep Learning[3] that bypass explicit keypoint detection. A central trade-off persists between model complexity—transformer layers demand more computation—and clinical robustness, particularly when radiograph quality varies. Ongoing questions include optimal augmentation strategies, generalization across diverse breeds, and whether interpretable keypoint outputs or streamlined regression better serves veterinary practice.

Claimed Contributions

VHSMarker: Web-based annotation tool for canine cardiac keypoints

The authors introduce VHSMarker, a clinician-oriented web tool that reduces annotation time from over a minute to 10–12 seconds per image while supporting real-time keypoint placement, automated VHS calculation, built-in quality checks, and seamless data export for scalable dataset creation.

1 retrieved paper
Canine Cardiac Keypoint (CCK) Dataset

The authors constructed a large-scale benchmark dataset comprising 21,465 annotated canine thoracic radiographs from 12,385 dogs across 144 breeds, making it the largest curated resource for canine cardiac analysis and providing a standardized benchmark for training and evaluation.

9 retrieved papers
Can Refute
MambaVHS: Baseline model integrating Mamba blocks for VHS estimation

The authors propose MambaVHS, a hierarchical deep learning model that integrates state-space modeling (Mamba blocks) with convolutional layers to achieve robust and accurate VHS prediction, achieving 91.8% test accuracy and establishing state-space modeling as a promising direction for veterinary imaging.

7 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

VHSMarker: Web-based annotation tool for canine cardiac keypoints

The authors introduce VHSMarker, a clinician-oriented web tool that reduces annotation time from over a minute to 10–12 seconds per image while supporting real-time keypoint placement, automated VHS calculation, built-in quality checks, and seamless data export for scalable dataset creation.

Contribution

Canine Cardiac Keypoint (CCK) Dataset

The authors constructed a large-scale benchmark dataset comprising 21,465 annotated canine thoracic radiographs from 12,385 dogs across 144 breeds, making it the largest curated resource for canine cardiac analysis and providing a standardized benchmark for training and evaluation.

Contribution

MambaVHS: Baseline model integrating Mamba blocks for VHS estimation

The authors propose MambaVHS, a hierarchical deep learning model that integrates state-space modeling (Mamba blocks) with convolutional layers to achieve robust and accurate VHS prediction, achieving 91.8% test accuracy and establishing state-space modeling as a promising direction for veterinary imaging.