Animating the Uncaptured: Humanoid Mesh Animation with Video Diffusion Models
Overview
Taxonomy
Research Landscape Overview
Claimed Contributions
The authors propose a novel approach to synthesize 4D animated sequences of static 3D humanoid meshes by leveraging motion priors from generative video models. Given a static mesh and text prompt, they generate a video using a T2V diffusion model and transfer the motion to the mesh.
The authors develop a tracking pipeline that extracts and combines 2D body landmarks, silhouettes, and dense DINOv2 features from generated video frames to accurately reconstruct and transfer motion to the input mesh using SMPL as a deformation proxy.
The authors introduce a method to register the SMPL body model to the input mesh and reparameterize mesh vertices using barycentric coordinates relative to SMPL faces, enabling motion transfer through optimization of SMPL parameters while maintaining mesh structure.
Core Task Comparisons
Comparisons with papers in the same taxonomy category
[1] Towards motion from video diffusion models PDF
[12] MotionDreamer: Zero-Shot 3D Mesh Animation from Video Diffusion Models PDF
[32] Motiondreamer: Exploring semantic video diffusion features for zero-shot 3d mesh animation PDF
Contribution Analysis
Detailed comparisons for each claimed contribution
Text-to-motion generation approach leveraging video diffusion models
The authors propose a novel approach to synthesize 4D animated sequences of static 3D humanoid meshes by leveraging motion priors from generative video models. Given a static mesh and text prompt, they generate a video using a T2V diffusion model and transfer the motion to the mesh.
[1] Towards motion from video diffusion models PDF
[6] Animax: Animating the inanimate in 3d with joint video-pose diffusion models PDF
[12] MotionDreamer: Zero-Shot 3D Mesh Animation from Video Diffusion Models PDF
[11] Animate3d: Animating any 3d model with multi-view video diffusion PDF
[39] Ct4d: Consistent text-to-4d generation with animatable meshes PDF
[51] FusionDeformer: text-guided mesh deformation using diffusion models PDF
[52] Animateme: 4d facial expressions via diffusion models PDF
[53] Articulated Kinematics Distillation from Video Diffusion Models PDF
[54] ShapeâConditioned Human Motion Diffusion Model with Mesh Representation PDF
[55] Tada! text to animatable digital avatars PDF
Robust motion tracking pipeline combining multiple cues
The authors develop a tracking pipeline that extracts and combines 2D body landmarks, silhouettes, and dense DINOv2 features from generated video frames to accurately reconstruct and transfer motion to the input mesh using SMPL as a deformation proxy.
[56] Neural localizer fields for continuous 3d human pose and shape estimation PDF
[57] Beyond Sparse Keypoints: Dense Pose Modeling for Robust Gait Recognition PDF
[58] SKEP-Net: Depth-based Human Pose Monitoring and Exercise Recognition using GMM-Segmentation PDF
[59] Aerial insights: deep learning-based human action recognition in drone imagery PDF
[60] Recovering 3d human body configurations using shape contexts PDF
[61] Depth map-based human activity tracking and recognition using body joints features and self-organized map PDF
[62] BodySLAM: joint camera localisation, mapping, and human motion tracking PDF
[63] MiShape PDF
[64] Nonrigid motion analysis: Articulated and elastic motion PDF
[65] Precision tracking via joint detailed shape estimation of arbitrary extended objects PDF
SMPL-based deformation proxy with barycentric reparameterization
The authors introduce a method to register the SMPL body model to the input mesh and reparameterize mesh vertices using barycentric coordinates relative to SMPL faces, enabling motion transfer through optimization of SMPL parameters while maintaining mesh structure.