Learning Part-Aware Dense 3D Feature Field For Generalizable Articulated Object Manipulation

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

Robotic ManipulationImitation Learning3D RepresentationGeneralizable Policy

Articulated object manipulation is essential for various real-world robotic tasks, yet generalizing across diverse objects remains a major challenge. A key to generalization lies in understanding functional parts (e.g., door handles and knobs), which indicate where and how to manipulate across diverse object categories and shapes. Previous works attempted to achieve generalization by introducing foundation features, while these features are mostly 2D-based and do not specifically consider functional parts. When lifting these 2D features to geometry-profound 3D space, challenges arise, such as long runtimes, multi-view inconsistencies, and low spatial resolution with insufficient geometric information. To address these issues, we propose \textbf{Part-Aware 3D Feature Field (PA3FF)}, a novel dense 3D feature with part awareness for generalizable articulated object manipulation. PA3FF is trained by 3D part proposals from a large-scale labeled datasets, via a contrastive learning formulation. Given point clouds as input, PA3FF predicts a continuous 3D feature field in a feedforward manner, where the distance between point feature reflects the proximity of functional parts: points with similar features are more likely to belong to the same part. Building on this feature, we introduce the \textbf{Part-Aware Diffusion Policy (PADP)}, an imitation learning framework aimed at enhancing sample efficiency and generalization for robotic manipulation. We evaluate PADP on several simulated and real-world tasks, demonstrating that PA3FF consistently outperforms a range of 2D and 3D representations in manipulation scenarios, including CLIP, DINOv2, and Grounded-SAM, achieving state-of-the-art performance. Beyond imitation learning, PA3FF enables diverse downstream methods, including correspondence learning and segmentation task, making it a versatile foundation for robotic manipulation.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes Part-Aware 3D Feature Field (PA3FF), a dense continuous 3D feature representation trained via contrastive learning on large-scale part-annotated datasets, and Part-Aware Diffusion Policy (PADP) for manipulation. It resides in the 'Dense 3D Feature Fields for Part-Aware Manipulation' leaf, which contains only three papers including the original work. This is a relatively sparse research direction within the broader taxonomy of fifty papers, suggesting that continuous 3D feature fields specifically designed for part-aware manipulation remain an emerging area compared to more populated branches like cross-category generalization or articulation modeling.

The taxonomy reveals several neighboring research directions. The sibling leaf 'Cross-Category Part-Based Generalization' (four papers) emphasizes shared part semantics across categories, while 'Affordance and Actionable Part Learning' (three papers) focuses on predicting interaction points rather than dense fields. The parent branch 'Part-Aware Representation Learning for Manipulation' also includes 'Part-Level Instruction Following' and 'Superpoint and Hierarchical Part Representations', indicating that the field explores multiple granularities of part encoding. The paper's approach of learning continuous fields contrasts with discrete segmentation methods in 'Articulation Modeling and Motion Estimation', particularly 'Part Segmentation and Motion Decomposition' (five papers), which jointly segment and estimate motion parameters.

Among thirty candidates examined, the contrastive learning framework for part-aware features shows overlap with prior work: two refutable candidates were identified from ten examined. The PA3FF contribution itself (ten candidates examined, zero refutable) and PADP (ten candidates examined, zero refutable) appear more novel within the limited search scope. The statistics suggest that while the core feature field and policy components may be relatively unexplored in this specific formulation, the training methodology via contrastive learning on part proposals has more substantial precedent. The analysis does not claim exhaustive coverage; these findings reflect top-thirty semantic matches and their citation neighborhoods.

Based on the limited literature search, the work appears to occupy a sparsely populated niche at the intersection of dense 3D representations and part-aware manipulation. The taxonomy structure and contribution-level statistics suggest moderate novelty for the feature field and policy components, with the contrastive learning approach showing clearer connections to existing methods. The scope examined—thirty candidates across three contributions—provides a snapshot rather than definitive coverage of the field.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Generalizable articulated object manipulation using part-aware 3D features. The field organizes around several complementary branches that together address the challenge of enabling robots to interact with articulated objects such as doors, drawers, and cabinets. Part-Aware Representation Learning for Manipulation focuses on extracting dense geometric and semantic features that distinguish functional parts, often leveraging neural fields or point cloud encodings to ground manipulation policies. Articulation Modeling and Motion Estimation tackles the inverse problem of inferring kinematic structures and motion parameters from observations, while Manipulation Policy Learning and Execution develops control strategies that exploit these representations. Supporting branches include 3D Reconstruction and Pose Estimation for building object models, Generative Modeling of Articulated Objects for synthesis and data augmentation, and Simulation Environments and Benchmarks such as SAPIEN[28] and ArtiBench[49] that provide standardized testbeds. Specialized Manipulation Scenarios address domain-specific challenges like deformable connections or temporal logic constraints. Recent work reveals a tension between end-to-end learning approaches and modular pipelines that explicitly model part semantics and articulation structure. Dense feature field methods like Part-Aware Dense Feature[0] and its close neighbors PartGS[23] and Part2GS[46] emphasize learning rich 3D representations that encode part boundaries and affordances directly from visual input, enabling generalization across object categories. In contrast, works such as GAPartNet[4] and PartManip[12] rely on explicit part segmentation and geometric reasoning to guide manipulation. Part-Aware Dense Feature[0] sits within the dense feature field cluster, sharing with PartGS[23] and Part2GS[46] an emphasis on continuous spatial encodings but differing in how part-level structure is regularized or supervised. This line of work contrasts with more structured approaches like UniArt[5] or ArtFormer[10], which impose stronger priors on articulation types, highlighting an ongoing exploration of how much geometric inductive bias is necessary for robust generalization.

Claimed Contributions

Part-Aware 3D Feature Field (PA3FF)

10 retrieved papers

A 3D-native representation that encodes dense, semantic, and functional part-aware features directly from point clouds in a feedforward manner. The feature field is trained using contrastive learning on 3D part proposals from large-scale labeled datasets, where feature proximity reflects functional part similarity.

10 retrieved papers

Part-Aware Diffusion Policy (PADP)

10 retrieved papers

An imitation learning framework that integrates PA3FF with a diffusion policy architecture for action generation. PADP leverages the part-aware 3D features to achieve sample-efficient and generalizable manipulation behaviors across diverse objects.

10 retrieved papers

Contrastive learning framework for part-aware features

Can Refute

10 retrieved papers

A training approach combining geometric loss (encouraging spatial consistency within parts) and semantic loss (aligning point features with part name embeddings from SigLip) to enhance part-awareness in the 3D feature field.

10 retrieved papers

Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[23] PartGS: Part-aware Modeling of Articulated Objects using 3D Gaussian Splatting PDF

T Yu, V Shah, M Wahed, Y Shen, KA Nguyen (2025)

[46] Part2GS: Part-aware Modeling of Articulated Objects using 3D Gaussian Splatting PDF

Yu Tianjiao, Shah, Vedant, Wahed, Muntasir, Shen, Ying, Nguyen, Kiet A., Lourentzou, Ismini (2025) • arXiv.org

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Part-Aware 3D Feature Field (PA3FF)

[2] Part-Guided 3D RL for Sim2Real Articulated Object Manipulation PDF

Cannot Refute

[3] Robot see robot do: Imitating articulated object manipulation with monocular 4d reconstruction PDF

Cannot Refute

[7] Where2Act: From Pixels to Actions for Articulated 3D Objects PDF

Cannot Refute

[23] PartGS: Part-aware Modeling of Articulated Objects using 3D Gaussian Splatting PDF

Cannot Refute

[24] FlowBot3D: Learning 3D Articulation Flow to Manipulate Articulated Objects PDF

Cannot Refute

[29] Articulate AnyMesh: Open-Vocabulary 3D Articulated Objects Modeling PDF

Cannot Refute

[61] VAT-Mart: Learning Visual Action Trajectory Proposals for Manipulating 3D ARTiculated Objects PDF

Cannot Refute

[62] ARC-Flow: Articulated, Resolution-Agnostic, Correspondence-Free Matching and Interpolation of 3D Shapes Under Flow Fields PDF

Cannot Refute

[63] Articulated Object Manipulation using Online Axis Estimation with SAM2-Based Tracking PDF

Cannot Refute

[64] Learning part motion of articulated objects using spatially continuous neural implicit representations PDF

Cannot Refute

Contribution

Part-Aware Diffusion Policy (PADP)

[65] On-device diffusion transformer policy for efficient robot manipulation PDF

Cannot Refute

[66] Cage: Causal Attention Enables Data-Efficient Generalizable Robotic Manipulation PDF

Cannot Refute

[67] Diffusion trajectory-guided policy for long-horizon robot manipulation PDF

Cannot Refute

[68] Scaling Diffusion Policy in Transformer to 1 Billion Parameters for Robotic Manipulation PDF

Cannot Refute

[69] Diffusion Policy Policy Optimization PDF

Cannot Refute

[70] Equivariant Policy Learning for Robotic Manipulation PDF

Cannot Refute

[71] Planning-guided diffusion policy learning for generalizable contact-rich bimanual manipulation PDF

Cannot Refute

[72] ADPro: a Test-time Adaptive Diffusion Policy via Manifold-constrained Denoising and Task-aware Initialization for Robotic Manipulation PDF

Cannot Refute

[73] Diff-dagger: Uncertainty estimation with diffusion policy for robotic manipulation PDF

Cannot Refute

[74] A Hybrid Framework Using Diffusion Policy and Residual RL for Force-Sensitive Robotic Manipulation PDF

Cannot Refute

Contribution

Contrastive learning framework for part-aware features

[53] PARTFIELD: Learning 3D Feature Fields for Part Segmentation and Beyond PDF

Can Refute

[55] Find any part in 3d PDF

Can Refute

[51] Guided point contrastive learning for semi-supervised point cloud semantic segmentation PDF

Cannot Refute

[52] Regionplc: Regional point-language contrastive learning for open-world 3d scene understanding PDF

Cannot Refute

[54] Omniseg3d: Omniversal 3d segmentation via hierarchical contrastive learning PDF

Cannot Refute

[56] Skeleton-Contrastive 3D Action Representation Learning PDF

Cannot Refute

[57] Groupcontrast: Semantic-aware self-supervised representation learning for 3d understanding PDF

Cannot Refute

[58] SCA3D: Enhancing Cross-Modal 3D Retrieval via 3D Shape and Caption Paired Data Augmentation PDF

Cannot Refute

[59] Nerf-feat: 6d object pose estimation using feature rendering PDF

Cannot Refute

[60] PLURAL: 3D point cloud transfer learning via contrastive learning with augmentations PDF

Cannot Refute

Learning Part-Aware Dense 3D Feature Field For Generalizable Articulated Object Manipulation

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[23] PartGS: Part-aware Modeling of Articulated Objects using 3D Gaussian Splatting PDF

[46] Part2GS: Part-aware Modeling of Articulated Objects using 3D Gaussian Splatting PDF

Contribution Analysis

Part-Aware 3D Feature Field (PA3FF)

[2] Part-Guided 3D RL for Sim2Real Articulated Object Manipulation PDF

[3] Robot see robot do: Imitating articulated object manipulation with monocular 4d reconstruction PDF

[7] Where2Act: From Pixels to Actions for Articulated 3D Objects PDF

[23] PartGS: Part-aware Modeling of Articulated Objects using 3D Gaussian Splatting PDF

[24] FlowBot3D: Learning 3D Articulation Flow to Manipulate Articulated Objects PDF

[29] Articulate AnyMesh: Open-Vocabulary 3D Articulated Objects Modeling PDF

[61] VAT-Mart: Learning Visual Action Trajectory Proposals for Manipulating 3D ARTiculated Objects PDF

[62] ARC-Flow: Articulated, Resolution-Agnostic, Correspondence-Free Matching and Interpolation of 3D Shapes Under Flow Fields PDF

[63] Articulated Object Manipulation using Online Axis Estimation with SAM2-Based Tracking PDF

[64] Learning part motion of articulated objects using spatially continuous neural implicit representations PDF

Part-Aware Diffusion Policy (PADP)

[65] On-device diffusion transformer policy for efficient robot manipulation PDF

[66] Cage: Causal Attention Enables Data-Efficient Generalizable Robotic Manipulation PDF

[67] Diffusion trajectory-guided policy for long-horizon robot manipulation PDF

[68] Scaling Diffusion Policy in Transformer to 1 Billion Parameters for Robotic Manipulation PDF

[69] Diffusion Policy Policy Optimization PDF

[70] Equivariant Policy Learning for Robotic Manipulation PDF

[71] Planning-guided diffusion policy learning for generalizable contact-rich bimanual manipulation PDF

[72] ADPro: a Test-time Adaptive Diffusion Policy via Manifold-constrained Denoising and Task-aware Initialization for Robotic Manipulation PDF

[73] Diff-dagger: Uncertainty estimation with diffusion policy for robotic manipulation PDF

[74] A Hybrid Framework Using Diffusion Policy and Residual RL for Force-Sensitive Robotic Manipulation PDF

Contrastive learning framework for part-aware features

[53] PARTFIELD: Learning 3D Feature Fields for Part Segmentation and Beyond PDF

[55] Find any part in 3d PDF

[51] Guided point contrastive learning for semi-supervised point cloud semantic segmentation PDF

[52] Regionplc: Regional point-language contrastive learning for open-world 3d scene understanding PDF

[54] Omniseg3d: Omniversal 3d segmentation via hierarchical contrastive learning PDF

[56] Skeleton-Contrastive 3D Action Representation Learning PDF

[57] Groupcontrast: Semantic-aware self-supervised representation learning for 3d understanding PDF

[58] SCA3D: Enhancing Cross-Modal 3D Retrieval via 3D Shape and Caption Paired Data Augmentation PDF

[59] Nerf-feat: 6d object pose estimation using feature rendering PDF

[60] PLURAL: 3D point cloud transfer learning via contrastive learning with augmentations PDF

Table of Contents