FastGHA: Generalized Few-Shot 3D Gaussian Head Avatars with Real-Time Animation

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

AnimationGaussian AvatarFeedforward Gaussian Model

Despite recent progress in 3D Gaussian-based head avatar modeling, efficiently generating high fidelity avatars remains a challenge. Current methods typically rely on extensive multi-view capture setups or monocular videos with per-identity optimization during inference, limiting their scalability and ease of use on unseen subjects. To overcome these efficiency drawbacks, we propose FastGHA, a feed-forward method to generate high-quality Gaussian head avatars from only a few input images while supporting real-time animation. Our approach directly learns a per-pixel Gaussian representation from the input images, and aggregates multi-view information using a transformer-based encoder that fuses image features from both DINOv3 and Stable Diffusion VAE. For real-time animation, we extend the explicit Gaussian representations with per-Gaussian features and introduce a lightweight MLP-based dynamic network to predict 3D Gaussian deformations from expression codes. Furthermore, to enhance geometric smoothness of the 3D head, we employ point maps from a pre-trained large reconstruction model as geometry supervision. Experiments show that our approach significantly outperforms existing methods in both rendering quality and inference efficiency, while supporting real-time dynamic avatar animation.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes FastGHA, a feed-forward framework for generating 3D Gaussian head avatars from a few input images with real-time animation capability. It resides in the Transformer-Based Generalization leaf, which contains only three papers total, indicating a relatively sparse research direction within the broader few-shot reconstruction landscape. This leaf focuses specifically on using transformer architectures to aggregate multi-view information and learn generalizable representations, distinguishing it from prior-based or single-image approaches in neighboring leaves.

The taxonomy reveals that FastGHA's immediate neighbors include Prior-Based Generalization methods that leverage learned 3D head priors from multi-view datasets, and Single-Image Reconstruction techniques that generate avatars from a single input. The broader Generalized Few-Shot Reconstruction Methods branch contrasts with Identity-Specific Reconstruction from Monocular Video, which requires per-identity optimization, and Multi-View Capture-Based Reconstruction, which demands extensive camera setups. FastGHA's transformer-based aggregation and cross-identity generalization position it at the intersection of efficiency and quality, diverging from both identity-specific optimization and multi-view capture paradigms.

Among the 30 candidates examined, the first contribution (FastGHA framework) shows one refutable candidate out of 10 examined, suggesting some prior work in generalized few-shot reconstruction exists but is limited in scope. The second contribution (lightweight MLP-based deformation network) has two refutable candidates among 10 examined, indicating more substantial overlap in real-time animation techniques. The third contribution (geometry prior regularization using VGGT) shows no refutable candidates among 10 examined, suggesting this specific supervision approach may be more novel within the limited search scope.

Based on the top-30 semantic matches examined, the work appears to occupy a moderately explored niche within transformer-based generalization for few-shot avatar reconstruction. The sparse taxonomy leaf (three papers) and limited refutation evidence suggest the specific combination of feed-forward transformer aggregation with real-time MLP deformation may offer incremental novelty, though the analysis does not cover the full breadth of related work in adjacent reconstruction paradigms or recent unpublished developments.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: few-shot 3D Gaussian head avatar reconstruction with real-time animation. The field organizes around several complementary directions. Generalized Few-Shot Reconstruction Methods aim to build models that can reconstruct avatars from minimal input across different identities, often leveraging transformer architectures or learned priors to achieve cross-identity generalization. Identity-Specific Reconstruction from Monocular Video focuses on optimizing high-fidelity avatars for individual subjects using longer monocular sequences, trading generalization for per-identity detail. Multi-View Capture-Based Reconstruction exploits synchronized camera rigs to achieve photorealistic quality and relightability, as seen in works like Relightable gaussian codec avatars[1] and Gaussian Pixel Codec Avatars[24]. Audio-Driven Animation tackles the challenge of synthesizing realistic lip sync and facial motion from speech signals, with methods such as GaussianSpeech[22] and EAvatar[23]. Specialized Animation and Generation Tasks explore text-to-avatar synthesis, expression transfer, and other creative applications, while Cross-Domain Applications and Surveys provide broader context and holistic reviews of talking head generation. Within the generalized few-shot branch, a key tension emerges between speed and quality: some approaches prioritize real-time performance by distilling complex priors into efficient feed-forward networks, while others invest in richer geometric or appearance models at the cost of longer inference. FastGHA[0] sits squarely in the transformer-based generalization cluster, emphasizing rapid reconstruction from sparse views by learning cross-identity patterns. This contrasts with nearby identity-specific methods like StreamME[3], which optimizes per-subject fidelity through iterative refinement, and with multi-view systems such as SEGA[2] that assume denser capture setups. The trade-off between generalization and per-identity detail remains a central open question, as does the challenge of maintaining temporal coherence and expression fidelity under extreme view sparsity.

Claimed Contributions

FastGHA framework for generalized few-shot 3D Gaussian head avatar reconstruction

Can Refute

10 retrieved papers

The authors introduce FastGHA, a feed-forward method that generates high-quality animatable 3D Gaussian head avatars from only a few input images. The framework learns per-pixel Gaussian representations and aggregates multi-view information using a transformer-based encoder that fuses features from DINOv3 and Stable Diffusion VAE, achieving superior reconstruction quality compared to existing approaches.

10 retrieved papers

Can Refute

Lightweight MLP-based deformation network for real-time animation

Can Refute

10 retrieved papers

The authors design a lightweight multi-layer perceptron that extends explicit Gaussian representations with learnable per-Gaussian features and predicts 3D Gaussian deformations from FLAME expression codes. This enables real-time dynamic avatar animation by acting independently on each Gaussian point for efficient and parallelizable deformation.

10 retrieved papers

Can Refute

Geometry prior regularization using VGGT for improved 3D consistency

10 retrieved papers

The authors employ point maps predicted from a pre-trained large reconstruction model (VGGT) as geometry supervision during training. Unlike prior work that directly uses predicted point maps as input, this approach incorporates the geometry prior as a regularization loss to enhance geometric smoothness and robustness without propagating artifacts from the prior.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[11] FastAvatar: Towards Unified Fast High-Fidelity 3D Avatar Reconstruction with Large Gaussian Reconstruction Transformers PDF

Wu Yue, WU Yufan, Li Wen, Lu Yuxi, Feng, Kairui, Chen Xuanhong (2025)

[25] FlexAvatar: Flexible Large Reconstruction Model for Animatable Gaussian Head Avatars with Detailed Deformation PDF

Cheng Peng, Zhuo Su, Liao Wang, Chen Guo, Zhaohu Li, Chengjiang Long, Zheng Lv, Jingxiang Sun, Chenyangguang Zhang, Yebin Liu (2025)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

FastGHA framework for generalized few-shot 3D Gaussian head avatar reconstruction

[30] HeadGAP: Few-Shot 3D Head Avatar via Generalizable Gaussian Priors PDF

Can Refute

[14] Generalizable and Animatable Gaussian Head Avatar PDF

Cannot Refute

[36] H3d-net: Few-shot high-fidelity 3d head reconstruction PDF

Cannot Refute

[53] GPAvatar: Generalizable and Precise Head Avatar from Image(s) PDF

Cannot Refute

[54] Zero-1-to-3: Zero-shot One Image to 3D Object PDF

Cannot Refute

[55] NOFA: NeRF-based One-shot Facial Avatar Reconstruction PDF

Cannot Refute

[56] Synthetic prior for few-shot drivable head avatar inversion PDF

Cannot Refute

[57] Realistic one-shot mesh-based head avatars PDF

Cannot Refute

[58] Generalizable One-shot Neural Head Avatar PDF

Cannot Refute

[59] Generalizable One-shot 3D Neural Head Avatar PDF

Cannot Refute

Contribution

Lightweight MLP-based deformation network for real-time animation

[44] 4D Gaussian Splatting for Real-Time Dynamic Scene Rendering PDF

Can Refute

[47] Human gaussian splatting: Real-time rendering of animatable avatars PDF

Can Refute

[43] 3DGS-Avatar: Animatable Avatars via Deformable 3D Gaussian Splatting PDF

Cannot Refute

[45] Deformable 3D Gaussians for High-Fidelity Monocular Dynamic Scene Reconstruction PDF

Cannot Refute

[46] GauFRe: Gaussian Deformation Fields for Real-Time Dynamic Novel View Synthesis PDF

Cannot Refute

[48] Drivable 3d gaussian avatars PDF

Cannot Refute

[49] CGS-SLAM: Compact 3D Gaussian Splatting for Dense Visual SLAM PDF

Cannot Refute

[50] Splattingavatar: Realistic real-time human avatars with mesh-embedded gaussian splatting PDF

Cannot Refute

[51] Mesh-based Gaussian Splatting for Real-time Large-scale Deformation PDF

Cannot Refute

[52] FRPGS: Fast, Robust, and Photorealistic Monocular Dynamic Scene Reconstruction With Deformable 3D Gaussians PDF

Cannot Refute

Contribution

Geometry prior regularization using VGGT for improved 3D consistency

[30] HeadGAP: Few-Shot 3D Head Avatar via Generalizable Gaussian Priors PDF

Cannot Refute

[34] SweetDreamer: Aligning Geometric Priors in 2D Diffusion for Consistent Text-to-3D PDF

Cannot Refute

[35] Implicit shape and appearance priors for few-shot full head reconstruction PDF

Cannot Refute

[36] H3d-net: Few-shot high-fidelity 3d head reconstruction PDF

Cannot Refute

[37] Deep geometric prior for surface reconstruction PDF

Cannot Refute

[38] GeoDream: Disentangling 2D and Geometric Priors for High-Fidelity and Consistent 3D Generation PDF

Cannot Refute

[39] Omniavatar: Geometry-guided controllable 3d head synthesis PDF

Cannot Refute

[40] InterGSEdit: Interactive 3D Gaussian Splatting Editing with 3D Geometry-Consistent Attention Prior PDF

Cannot Refute

[41] 3d gaussian parametric head model PDF

Cannot Refute

[42] Learning dynamic tetrahedra for high-quality talking head synthesis PDF

Cannot Refute

FastGHA: Generalized Few-Shot 3D Gaussian Head Avatars with Real-Time Animation

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[11] FastAvatar: Towards Unified Fast High-Fidelity 3D Avatar Reconstruction with Large Gaussian Reconstruction Transformers PDF

[25] FlexAvatar: Flexible Large Reconstruction Model for Animatable Gaussian Head Avatars with Detailed Deformation PDF

Contribution Analysis

FastGHA framework for generalized few-shot 3D Gaussian head avatar reconstruction

[30] HeadGAP: Few-Shot 3D Head Avatar via Generalizable Gaussian Priors PDF

[14] Generalizable and Animatable Gaussian Head Avatar PDF

[36] H3d-net: Few-shot high-fidelity 3d head reconstruction PDF

[53] GPAvatar: Generalizable and Precise Head Avatar from Image(s) PDF

[54] Zero-1-to-3: Zero-shot One Image to 3D Object PDF

[55] NOFA: NeRF-based One-shot Facial Avatar Reconstruction PDF

[56] Synthetic prior for few-shot drivable head avatar inversion PDF

[57] Realistic one-shot mesh-based head avatars PDF

[58] Generalizable One-shot Neural Head Avatar PDF

[59] Generalizable One-shot 3D Neural Head Avatar PDF

Lightweight MLP-based deformation network for real-time animation

[44] 4D Gaussian Splatting for Real-Time Dynamic Scene Rendering PDF

[47] Human gaussian splatting: Real-time rendering of animatable avatars PDF

[43] 3DGS-Avatar: Animatable Avatars via Deformable 3D Gaussian Splatting PDF

[45] Deformable 3D Gaussians for High-Fidelity Monocular Dynamic Scene Reconstruction PDF

[46] GauFRe: Gaussian Deformation Fields for Real-Time Dynamic Novel View Synthesis PDF

[48] Drivable 3d gaussian avatars PDF

[49] CGS-SLAM: Compact 3D Gaussian Splatting for Dense Visual SLAM PDF

[50] Splattingavatar: Realistic real-time human avatars with mesh-embedded gaussian splatting PDF

[51] Mesh-based Gaussian Splatting for Real-time Large-scale Deformation PDF

[52] FRPGS: Fast, Robust, and Photorealistic Monocular Dynamic Scene Reconstruction With Deformable 3D Gaussians PDF

Geometry prior regularization using VGGT for improved 3D consistency

[30] HeadGAP: Few-Shot 3D Head Avatar via Generalizable Gaussian Priors PDF

[34] SweetDreamer: Aligning Geometric Priors in 2D Diffusion for Consistent Text-to-3D PDF

[35] Implicit shape and appearance priors for few-shot full head reconstruction PDF

[36] H3d-net: Few-shot high-fidelity 3d head reconstruction PDF

[37] Deep geometric prior for surface reconstruction PDF

[38] GeoDream: Disentangling 2D and Geometric Priors for High-Fidelity and Consistent 3D Generation PDF

[39] Omniavatar: Geometry-guided controllable 3d head synthesis PDF

[40] InterGSEdit: Interactive 3D Gaussian Splatting Editing with 3D Geometry-Consistent Attention Prior PDF

[41] 3d gaussian parametric head model PDF

[42] Learning dynamic tetrahedra for high-quality talking head synthesis PDF

Table of Contents