FullPart: Generating each 3D Part at Full Resolution

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 7.5 Download Report PDF

3D GenerationDiffusion ModelPart Generation

Part-based 3D generation holds great potential for various applications. Previous part generators that represent parts using implicit vector-set tokens often suffer from insufficient geometric details. Another line of work adopts an explicit voxel representation but shares a global voxel grid among all parts; this often causes small parts to occupy too few voxels, leading to degraded quality. In this paper, we propose FullPart, a novel framework that combines both implicit and explicit paradigms. It first derives the bounding box layout through an implicit box vector-set diffusion process, a task that implicit diffusion handles effectively since box tokens contain little geometric detail. Then, it generates detailed parts, each within its own fixed full-resolution voxel grid. Instead of sharing a global low-resolution space, each part in our method—even small ones—is generated at full resolution, enabling the synthesis of intricate details. We further introduce a center-point encoding strategy to address the misalignment issue when exchanging information between parts of different actual sizes, thereby maintaining global coherence. Moreover, to tackle the scarcity of reliable part data, we present PartVerse-XL, the largest human-annotated 3D part dataset to date. Extensive experiments demonstrate that FullPart achieves state-of-the-art results in 3D part generation. We will release all code, data, and model to benefit future research in 3D part generation.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: part-based 3D object generation with full-resolution representation. The field organizes around several complementary directions. Part-Aware Representation and Decomposition Strategies focus on how to segment and model objects as assemblies of meaningful components, often leveraging latent diffusion or hierarchical encodings to capture part-level semantics. Full-Resolution and High-Fidelity Volumetric Approaches emphasize maintaining geometric detail at scale, exploring techniques such as dual volume packing (Dual Volume Packing[5]) or octree-based representations (OctGPT[10]) to balance memory and fidelity. Compositional and Multi-View Synthesis Methods address the challenge of generating coherent 3D content from multiple viewpoints or by composing learned part priors, with works like Category Aware Composition[1] and Sparc3D[2] illustrating how category-specific knowledge can guide assembly. Domain-Specific Part-Based Applications tailor these ideas to specialized contexts—ranging from human body modeling (GHUM[20]) and talking avatars (TalkingGaussian[13], PoseTalker[29]) to medical imaging (3D MRI Synthesis[27])—while Representation Learning and Encoding Foundations provide the underlying machinery, including variational autoencoders (Shape VAE[14]) and discrete tokenization schemes (Discrete Representation Learning[25]). A particularly active line of work explores part-level latent diffusion, where generative models operate in a structured latent space that respects object decomposition. Within this cluster, Contextual Part Latents[3] conditions diffusion on part relationships to ensure coherent assembly, while FullPart[0] extends this idea by maintaining full-resolution detail throughout the generation process, avoiding the loss of fine geometric features that can occur with coarser representations. Nearby efforts such as Diverse Part Synthesis[11] and Assembler[7] similarly emphasize compositional generation but may trade off resolution for broader part diversity or faster sampling. The central tension across these branches is between expressive part-level control and the computational cost of high-fidelity volumetric outputs. FullPart[0] sits at the intersection of contextual part modeling and full-resolution synthesis, aiming to preserve both semantic decomposition and geometric detail—a balance that distinguishes it from methods prioritizing either coarse part assembly or resolution alone.

Claimed Contributions

FullPart framework combining implicit and explicit paradigms

10 retrieved papers

The authors introduce FullPart, a framework that first generates bounding box layouts using implicit vecset diffusion, then generates each part at full resolution within its own dedicated voxel grid using explicit representation. This design addresses limitations of prior methods by enabling fine geometric details while maintaining global coherence.

10 retrieved papers

Center-corner encoding strategy for part coherence

10 retrieved papers

The authors propose a center-corner encoding mechanism that embeds absolute spatial context for each voxel by encoding the positions of its center and eight corners in a unified super-high-resolution global coordinate system. This addresses the scale misalignment problem when parts of different sizes exchange information through attention mechanisms.

10 retrieved papers

PartVerse-XL dataset

Can Refute

10 retrieved papers

The authors introduce PartVerse-XL, the largest human-annotated 3D part dataset to date, containing 40K objects and 320K parts with associated part-aware texture descriptions. The dataset was created through mesh pre-segmentation followed by human refinement to ensure high-quality, semantically consistent annotations.

10 retrieved papers

Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[3] From one to more: Contextual part latents for 3d generation PDF

Dong, Shaocong, Ding, Lihe, Chen Xiao, Li, Yaokun, Wang Yu-xin, Wang Yu-cheng, Wang Qi, Kim Jaehyeok, Gao, Chenjian, Huang Zhanpeng, Wang Zi-bin, Xue, Tianfan, Xu Dan (2025)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

FullPart framework combining implicit and explicit paradigms

[51] Partsdf: Part-based implicit neural representation for composite 3d shape parametrization and optimization PDF

Cannot Refute

[52] Neural parts: Learning expressive 3d shape abstractions with invertible neural networks PDF

Cannot Refute

[53] Anise: Assembly-based neural implicit surface reconstruction PDF

Cannot Refute

[54] Generating Part-Aware Editable 3D Shapes without 3D Supervision PDF

Cannot Refute

[55] Learning Explicit Contact for Implicit Reconstruction of Hand-held Objects from Monocular Images PDF

Cannot Refute

[56] ImplicitâExplicit Coupling Enhancement for UAV Scene 3D Reconstruction PDF

Cannot Refute

[57] Implicit Neural Head Synthesis via Controllable Local Deformation Fields PDF

Cannot Refute

[58] PARIS: Part-level Reconstruction and Motion Analysis for Articulated Objects PDF

Cannot Refute

[59] SENS: PartâAware Sketchâbased Implicit Neural Shape Modeling PDF

Cannot Refute

[60] LEIA: Latent View-invariant Embeddings for Implicit 3D Articulation PDF

Cannot Refute

Contribution

Center-corner encoding strategy for part coherence

[3] From one to more: Contextual part latents for 3d generation PDF

Cannot Refute

[4] Ultra3D: Efficient and High-Fidelity 3D Generation with Part Attention PDF

Cannot Refute

[43] Romantex: Decoupling 3d-aware rotary positional embedded multi-attention network for texture synthesis PDF

Cannot Refute

[44] Omnipart: Part-aware 3d generation with semantic decoupling and structural cohesion PDF

Cannot Refute

[45] PartCrafter: Structured 3D Mesh Generation via Compositional Latent Diffusion Transformers PDF

Cannot Refute

[46] Videograin: Modulating space-time attention for multi-grained video editing PDF

Cannot Refute

[47] Multi-Modality Regional Alignment Network for Covid X-Ray Survival Prediction and Report Generation PDF

Cannot Refute

[48] Enhanced Monocular Depth Estimation Based on Improved Self-Attention Mechanisms and Composite Loss Functions PDF

Cannot Refute

[49] mpAuvS: multi-perspective attention for unsupervised video summarizationâcapturing global, local, and spatiotemporal context: C. Xin et al. PDF

Cannot Refute

[50] Directional Non-Commutative Monoidal Structures for Compositional Embeddings in Machine Learning PDF

Cannot Refute

Contribution

PartVerse-XL dataset

[3] From one to more: Contextual part latents for 3d generation PDF

Can Refute

[34] A large-scale annotated mechanical components benchmark for classification and retrieval tasks with deep neural networks PDF

Cannot Refute

[35] MinD-3D++: Advancing fMRI-Based 3D Reconstruction With High-Quality Textured Mesh Generation and a Comprehensive Dataset PDF

Cannot Refute

[36] PARIS3D: Reasoning-based 3D Part Segmentation Using Large Multimodal Model PDF

Cannot Refute

[37] Clevrer-humans: Describing physical and causal events the human way PDF

Cannot Refute

[38] CityRefer: Geography-aware 3D Visual Grounding Dataset on City-scale Point Cloud Data PDF

Cannot Refute

[39] PartNeXt: A Next-Generation Dataset for Fine-Grained and Hierarchical 3D Part Understanding PDF

Cannot Refute

[40] Affogato: Learning Open-Vocabulary Affordance Grounding with Automated Data Generation at Scale PDF

Cannot Refute

[41] Segment Any 3D-Part in a Scene from a Sentence PDF

Cannot Refute

[42] Multimodal Large Language Models: A Survey PDF

Cannot Refute

FullPart: Generating each 3D Part at Full Resolution

Overview

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[3] From one to more: Contextual part latents for 3d generation PDF

Contribution Analysis

FullPart framework combining implicit and explicit paradigms

[51] Partsdf: Part-based implicit neural representation for composite 3d shape parametrization and optimization PDF

[52] Neural parts: Learning expressive 3d shape abstractions with invertible neural networks PDF

[53] Anise: Assembly-based neural implicit surface reconstruction PDF

[54] Generating Part-Aware Editable 3D Shapes without 3D Supervision PDF

[55] Learning Explicit Contact for Implicit Reconstruction of Hand-held Objects from Monocular Images PDF

[56] ImplicitâExplicit Coupling Enhancement for UAV Scene 3D Reconstruction PDF

[57] Implicit Neural Head Synthesis via Controllable Local Deformation Fields PDF

[58] PARIS: Part-level Reconstruction and Motion Analysis for Articulated Objects PDF

[59] SENS: PartâAware Sketchâbased Implicit Neural Shape Modeling PDF

[60] LEIA: Latent View-invariant Embeddings for Implicit 3D Articulation PDF

Center-corner encoding strategy for part coherence

[3] From one to more: Contextual part latents for 3d generation PDF

[4] Ultra3D: Efficient and High-Fidelity 3D Generation with Part Attention PDF

[43] Romantex: Decoupling 3d-aware rotary positional embedded multi-attention network for texture synthesis PDF

[44] Omnipart: Part-aware 3d generation with semantic decoupling and structural cohesion PDF

[45] PartCrafter: Structured 3D Mesh Generation via Compositional Latent Diffusion Transformers PDF

[46] Videograin: Modulating space-time attention for multi-grained video editing PDF

[47] Multi-Modality Regional Alignment Network for Covid X-Ray Survival Prediction and Report Generation PDF

[48] Enhanced Monocular Depth Estimation Based on Improved Self-Attention Mechanisms and Composite Loss Functions PDF

[49] mpAuvS: multi-perspective attention for unsupervised video summarizationâcapturing global, local, and spatiotemporal context: C. Xin et al. PDF

[50] Directional Non-Commutative Monoidal Structures for Compositional Embeddings in Machine Learning PDF

PartVerse-XL dataset

[3] From one to more: Contextual part latents for 3d generation PDF

[34] A large-scale annotated mechanical components benchmark for classification and retrieval tasks with deep neural networks PDF

[35] MinD-3D++: Advancing fMRI-Based 3D Reconstruction With High-Quality Textured Mesh Generation and a Comprehensive Dataset PDF

[36] PARIS3D: Reasoning-based 3D Part Segmentation Using Large Multimodal Model PDF

[37] Clevrer-humans: Describing physical and causal events the human way PDF

[38] CityRefer: Geography-aware 3D Visual Grounding Dataset on City-scale Point Cloud Data PDF

[39] PartNeXt: A Next-Generation Dataset for Fine-Grained and Hierarchical 3D Part Understanding PDF

[40] Affogato: Learning Open-Vocabulary Affordance Grounding with Automated Data Generation at Scale PDF

[41] Segment Any 3D-Part in a Scene from a Sentence PDF

[42] Multimodal Large Language Models: A Survey PDF

Table of Contents

[56] ImplicitâExplicit Coupling Enhancement for UAV Scene 3D Reconstruction PDF

[59] SENS: PartâAware Sketchâbased Implicit Neural Shape Modeling PDF

[49] mpAuvS: multi-perspective attention for unsupervised video summarizationâcapturing global, local, and spatiotemporal context: C. Xin et al. PDF