MoMaGen: Generating Demonstrations under Soft and Hard Constraints for Multi-Step Bimanual Mobile Manipulation

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

Data Generation for Robot LearningBimanual Mobile ManipulationImitation Learning for Robotics

Imitation learning from large-scale, diverse human demonstrations has proven effective for training robots, but collecting such data is costly and time-consuming. This challenge is amplified for multi-step bimanual mobile manipulation, where humans must teleoperate both a mobile base and two high-degree-of-freedom arms. Prior automated data generation frameworks have addressed static bimanual manipulation by augmenting a few human demonstrations in simulation, but they fall short for mobile settings due to two key challenges: (1) determining base placement to ensure reachability, and (2) positioning the camera to provide sufficient visibility for visuomotor policies. To address these issues, we introduce MoMaGen, which formulates data generation as a constrained optimization problem that enforces hard constraints (e.g., reachability) while balancing soft constraints (e.g., visibility during navigation). This formulation generalizes prior approaches and provides a principled foundation for future methods. We evaluate MoMaGen on four multi-step bimanual mobile manipulation tasks and show that it generates significantly more diverse datasets than existing methods. Leveraging this diversity, MoMaGen can train successful imitation learning policies from a single source demonstration, and these policies can be fine-tuned with as few as 40 real-world demonstrations to achieve deployment on physical robotic hardware. More details are available at our project page: momagen-iclr2026.github.io.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces MoMaGen, a framework for automatically generating demonstration data for multi-step bimanual mobile manipulation tasks through constrained optimization. It resides in the Simulation-Based Demonstration Synthesis leaf, which contains four papers total, indicating a moderately populated research direction. The sibling papers include DexMimicGen, RoboTwin, and one other work, all addressing automated demonstration synthesis in simulation. This leaf sits within the broader Demonstration Generation and Data Synthesis branch, suggesting the paper targets a recognized but not overcrowded problem space where scalable data generation remains an active challenge.

The taxonomy reveals neighboring research directions that contextualize MoMaGen's positioning. The adjacent Real-World Demonstration Augmentation leaf contains methods that proliferate from limited human examples rather than pure simulation synthesis. Downstream, the Imitation Learning for Bimanual Manipulation branch shows how generated demonstrations feed into policy learning architectures. The Mobile Manipulation Integration branch addresses navigation-manipulation coordination, which MoMaGen explicitly targets through its base placement and camera visibility constraints. The paper appears to bridge simulation-based synthesis with mobile manipulation challenges, connecting two previously separate research threads in the taxonomy structure.

Among thirty candidates examined across three contributions, none were identified as clearly refuting the proposed work. The MoMaGen framework contribution examined ten candidates with zero refutable matches, as did the unified constrained optimization formulation and the reachability-visibility constraints. This suggests that within the limited search scope, the specific combination of constrained optimization for mobile bimanual demonstration generation appears relatively unexplored. However, the analysis explicitly notes this reflects top-K semantic search results rather than exhaustive coverage, meaning potentially relevant prior work in adjacent optimization-based planning domains may exist outside the examined candidate set.

Based on the limited literature search, the work appears to occupy a distinct position combining simulation-based synthesis with mobile manipulation constraints. The taxonomy structure shows related work in static bimanual synthesis and separate efforts in mobile manipulation coordination, but the examined candidates did not reveal direct overlap with MoMaGen's integrated approach. The analysis acknowledges its scope limitations, covering thirty semantically similar papers rather than comprehensive field coverage, particularly in adjacent constraint-based planning and trajectory optimization domains that may contain relevant methodological precedents.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Generating demonstrations for multi-step bimanual mobile manipulation. The field addresses the challenge of creating training data for robots that must coordinate two arms while navigating through environments to complete complex, sequential tasks. The taxonomy reveals several complementary research directions: Demonstration Generation and Data Synthesis focuses on creating scalable training data through simulation and automated methods, exemplified by works like DexMimicGen[5] and RoboTwin[14]; Data Collection Systems and Benchmarks establishes standardized platforms and datasets such as Bigym[3] and Mobile ALOHA[2]; Imitation Learning for Bimanual Manipulation develops policies that learn coordinated dual-arm behaviors from demonstrations; Task and Motion Planning tackles the symbolic and geometric reasoning required for long-horizon tasks; Reinforcement Learning and Optimization-Based Control explores learning-based and model-based approaches to skill acquisition; and Mobile Manipulation Integration addresses the unique challenges of combining locomotion with manipulation. A central tension emerges between simulation-based synthesis approaches that promise scalability and real-world data collection that captures physical nuances. Within Demonstration Generation, some works pursue fully automated synthesis pipelines while others like Mobile ALOHA[12] emphasize teleoperation systems for high-quality human demonstrations. MoMaGen[0] sits within the Simulation-Based Demonstration Synthesis cluster alongside DexMimicGen[5] and RoboTwin[14], focusing on automated generation of diverse bimanual mobile manipulation trajectories in simulation. Compared to DexMimicGen[5], which emphasizes dexterous in-hand manipulation, MoMaGen[0] appears to tackle the broader integration of mobility with dual-arm coordination. The interplay between these synthesis methods and imitation learning architectures like RDT-1B[1] highlights ongoing questions about how to bridge the sim-to-real gap while maintaining demonstration diversity and task coverage.

Claimed Contributions

MoMaGen framework for bimanual mobile manipulation data generation

10 retrieved papers

The authors introduce MoMaGen, a framework that formulates automated demonstration generation for bimanual mobile manipulation as a constrained optimization problem. This formulation addresses reachability and visibility challenges unique to mobile manipulators by incorporating both hard constraints that must be satisfied and soft constraints that are optimized.

10 retrieved papers

Unified constrained optimization formulation for X-Gen methods

10 retrieved papers

The authors provide a unified framework that interprets existing X-Gen family methods (MimicGen, SkillMimicGen, DexMimicGen) as instances of constrained optimization with different constraint sets. This generalization offers a principled foundation for understanding and developing automated data generation approaches.

10 retrieved papers

Novel reachability and visibility constraints for mobile manipulation

10 retrieved papers

The authors introduce technical innovations including reachability as a hard constraint to ensure manipulability, object visibility during manipulation as a hard constraint for visuomotor policy training, object visibility during navigation as a soft constraint, and retraction as a soft constraint to promote safe navigation after manipulation.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[5] DexMimicGen: Automated Data Generation for Bimanual Dexterous Manipulation via Imitation Learning PDF

Zhenyu Jiang, Yuqi Xie, Kevin Lin, Zhenjia Xu, Weikang Wan, Ajay Mandlekar, Yuke Zhu, Linxi Fan (2025)

[14] Robotwin: Dual-arm robot benchmark with generative digital twins PDF

Mu Yao, Chen Tian-xing, Lan ZhiQian, Gao, Zeyu, Liang Zhixuan, Yu, Qiaojun, Xu, Mingkun, Xie Zhi-qiang, Ding, Mingyu, Luo, Ping (2025)

[42] RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins (early version) PDF

Yao Mu, Tianxing Chen, Shijia Peng, Zanxin Chen, Zeyu Gao, Zhiqiang Xie, Ping Luo, Yude Zou, Lunkai Lin (0)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

MoMaGen framework for bimanual mobile manipulation data generation

[9] C-learn: Learning geometric constraints from demonstrations for multi-step manipulation in shared autonomy PDF

Cannot Refute

[18] One-Shot Real-World Demonstration Synthesis for Scalable Bimanual Manipulation PDF

Cannot Refute

[36] SViP: Sequencing Bimanual Visuomotor Policies with Object-Centric Motion Primitives PDF

Cannot Refute

[38] Towards a Comprehensive Benchmark for Embodied AI and Robotics PDF

Cannot Refute

[61] Physics-Driven Data Generation for Contact-Rich Manipulation via Trajectory Optimization PDF

Cannot Refute

[62] Adaptive Diffusion Constrained Sampling for Bimanual Robot Manipulation PDF

Cannot Refute

[63] Humanoidgen: Data generation for bimanual dexterous manipulation via llm reasoning PDF

Cannot Refute

[64] D-CODA: Diffusion for Coordinated Dual-Arm Data Augmentation PDF

Cannot Refute

[65] ROPA: Synthetic Robot Pose Generation for RGB-D Bimanual Data Augmentation PDF

Cannot Refute

[66] Action Planning Including Holding Pattern Selection for Organizing Office Chairs by a Dual-Arm Mobile Robot PDF

Cannot Refute

Contribution

Unified constrained optimization formulation for X-Gen methods

[43] Surfer: A world model-based framework for vision-language robot manipulation PDF

Cannot Refute

[44] Genie envisioner: A unified world foundation platform for robotic manipulation PDF

Cannot Refute

[45] Transferring foundation models for generalizable robotic manipulation PDF

Cannot Refute

[46] Robotwin 2.0: A scalable data generator and benchmark with strong domain randomization for robust bimanual robotic manipulation PDF

Cannot Refute

[47] Generative artificial intelligence in robotic manipulation: A survey PDF

Cannot Refute

[48] Diffusion models for robotic manipulation: a survey PDF

Cannot Refute

[49] Rapid and automated configuration of robot manufacturing cells PDF

Cannot Refute

[50] ManualVLA: A Unified VLA Model for Chain-of-Thought Manual Generation and Robotic Manipulation PDF

Cannot Refute

[51] A framework for efficient robotic manipulation PDF

Cannot Refute

[52] UMIGen: A Unified Framework for Egocentric Point Cloud Generation and Cross-Embodiment Robotic Imitation Learning PDF

Cannot Refute

Contribution

Novel reachability and visibility constraints for mobile manipulation

[38] Towards a Comprehensive Benchmark for Embodied AI and Robotics PDF

Cannot Refute

[52] UMIGen: A Unified Framework for Egocentric Point Cloud Generation and Cross-Embodiment Robotic Imitation Learning PDF

Cannot Refute

[53] Active-Perceptive Motion Generation for Mobile Manipulation PDF

Cannot Refute

[54] Demobot: Deformable mobile manipulation with vision-based sub-goal retrieval PDF

Cannot Refute

[55] Automated construction of robotic manipulation programs PDF

Cannot Refute

[56] Mobi-: Mobilizing Your Robot Learning Policy PDF

Cannot Refute

[57] Elastic roadmapsâmotion generation for autonomous mobile manipulation PDF

Cannot Refute

[58] Visibility-aware navigation among movable obstacles PDF

Cannot Refute

[59] AnywhereVLA: Language-Conditioned Exploration and Mobile Manipulation PDF

Cannot Refute

[60] Cognition-enabled autonomous robot control for the realization of home chore task intelligence PDF

Cannot Refute

MoMaGen: Generating Demonstrations under Soft and Hard Constraints for Multi-Step Bimanual Mobile Manipulation

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[5] DexMimicGen: Automated Data Generation for Bimanual Dexterous Manipulation via Imitation Learning PDF

[14] Robotwin: Dual-arm robot benchmark with generative digital twins PDF

[42] RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins (early version) PDF

Contribution Analysis

MoMaGen framework for bimanual mobile manipulation data generation

[9] C-learn: Learning geometric constraints from demonstrations for multi-step manipulation in shared autonomy PDF

[18] One-Shot Real-World Demonstration Synthesis for Scalable Bimanual Manipulation PDF

[36] SViP: Sequencing Bimanual Visuomotor Policies with Object-Centric Motion Primitives PDF

[38] Towards a Comprehensive Benchmark for Embodied AI and Robotics PDF

[61] Physics-Driven Data Generation for Contact-Rich Manipulation via Trajectory Optimization PDF

[62] Adaptive Diffusion Constrained Sampling for Bimanual Robot Manipulation PDF

[63] Humanoidgen: Data generation for bimanual dexterous manipulation via llm reasoning PDF

[64] D-CODA: Diffusion for Coordinated Dual-Arm Data Augmentation PDF

[65] ROPA: Synthetic Robot Pose Generation for RGB-D Bimanual Data Augmentation PDF

[66] Action Planning Including Holding Pattern Selection for Organizing Office Chairs by a Dual-Arm Mobile Robot PDF

Unified constrained optimization formulation for X-Gen methods

[43] Surfer: A world model-based framework for vision-language robot manipulation PDF

[44] Genie envisioner: A unified world foundation platform for robotic manipulation PDF

[45] Transferring foundation models for generalizable robotic manipulation PDF

[46] Robotwin 2.0: A scalable data generator and benchmark with strong domain randomization for robust bimanual robotic manipulation PDF

[47] Generative artificial intelligence in robotic manipulation: A survey PDF

[48] Diffusion models for robotic manipulation: a survey PDF

[49] Rapid and automated configuration of robot manufacturing cells PDF

[50] ManualVLA: A Unified VLA Model for Chain-of-Thought Manual Generation and Robotic Manipulation PDF

[51] A framework for efficient robotic manipulation PDF

[52] UMIGen: A Unified Framework for Egocentric Point Cloud Generation and Cross-Embodiment Robotic Imitation Learning PDF

Novel reachability and visibility constraints for mobile manipulation

[38] Towards a Comprehensive Benchmark for Embodied AI and Robotics PDF

[52] UMIGen: A Unified Framework for Egocentric Point Cloud Generation and Cross-Embodiment Robotic Imitation Learning PDF

[53] Active-Perceptive Motion Generation for Mobile Manipulation PDF

[54] Demobot: Deformable mobile manipulation with vision-based sub-goal retrieval PDF

[55] Automated construction of robotic manipulation programs PDF

[56] Mobi-: Mobilizing Your Robot Learning Policy PDF

[57] Elastic roadmapsâmotion generation for autonomous mobile manipulation PDF

[58] Visibility-aware navigation among movable obstacles PDF

[59] AnywhereVLA: Language-Conditioned Exploration and Mobile Manipulation PDF

[60] Cognition-enabled autonomous robot control for the realization of home chore task intelligence PDF

Table of Contents

[57] Elastic roadmapsâmotion generation for autonomous mobile manipulation PDF