GenCape: Structure-Inductive Generative Modeling for Category-Agnostic Pose Estimation
Overview
Taxonomy
Research Landscape Overview
Claimed Contributions
The authors propose GenCape, a generative framework that uses an iterative Structure-aware Variational Autoencoder (i-SVAE) to learn instance-specific keypoint relationships (adjacency matrices) directly from support images, eliminating the need for predefined anatomical priors or textual descriptions.
The authors introduce a Compositional Graph Transfer (CGT) module that fuses multiple latent graph hypotheses into a query-aware structure using Bayesian fusion and attention-based reweighting, improving robustness under noisy or mismatched support scenarios.
The authors report that GenCape achieves state-of-the-art results on the MP-100 benchmark in both 1-shot and 5-shot settings, outperforming existing methods without requiring external structural or textual annotations.
Core Task Comparisons
Comparisons with papers in the same taxonomy category
Contribution Analysis
Detailed comparisons for each claimed contribution
GenCape framework with iterative Structure-aware Variational Autoencoder
The authors propose GenCape, a generative framework that uses an iterative Structure-aware Variational Autoencoder (i-SVAE) to learn instance-specific keypoint relationships (adjacency matrices) directly from support images, eliminating the need for predefined anatomical priors or textual descriptions.
[11] Scape: A simple and strong category-agnostic pose estimator PDF
[35] Priormotion: Generative class-agnostic motion prediction with raster-vector motion field priors PDF
[36] Ghum & ghuml: Generative 3d human shape and articulated pose models PDF
[37] Animax: Animating the inanimate in 3d with joint video-pose diffusion models PDF
[38] Weakly-supervised action localization by generative attention modeling PDF
[39] Using EfficientNet-B7 (CNN), variational auto encoder (VAE) and Siamese twins' networks to evaluate human exercises as super objects in a TSSCI images PDF
[40] SV-GS: Sparse View 4D Reconstruction with Skeleton-Driven Gaussian Splatting PDF
[41] ROLA: real-world object-centric learning with attention optimization PDF
[42] Modulating Depth Map Features to Estimate 3D Human Pose via Multi-Task Variational Autoencoders PDF
[43] NeRF Explored: A Comprehensive Analysis of Neural Radiance Field in 3D Vision PDF
Compositional Graph Transfer mechanism
The authors introduce a Compositional Graph Transfer (CGT) module that fuses multiple latent graph hypotheses into a query-aware structure using Bayesian fusion and attention-based reweighting, improving robustness under noisy or mismatched support scenarios.
[44] Heterogeneous graph embedding by aggregating meta-path and meta-structure through attention mechanism PDF
[45] Attention-guided fusion of transformers and CNNs for enhanced medical image segmentation PDF
[46] BaGFN: Broad Attentive Graph Fusion Network for High-Order Feature Interactions PDF
[47] Modality-Guided Dynamic Graph Fusion and Temporal Diffusion for Self-Supervised RGB-T Tracking PDF
[48] A Multimodal Graph Recommendation Method Based on Cross-Attention Fusion PDF
[49] Layer-wise representation fusion for compositional generalization PDF
[50] SAFFNet: self-attention based on Fourier frequency domain filter network for visual question answering PDF
[51] AMGNet: An Attention-Guided Multi-Graph Collaborative Decision Network for Safe Medication Recommendation PDF
[52] Attention Multihop Graph and Multiscale Convolutional Fusion Network for Hyperspectral Image Classification PDF
[53] Adaptive Neighbor Graph Aggregated Graph Attention Network for Heterogeneous Graph Embedding PDF
State-of-the-art performance on MP-100 without external annotations
The authors report that GenCape achieves state-of-the-art results on the MP-100 benchmark in both 1-shot and 5-shot settings, outperforming existing methods without requiring external structural or textual annotations.