RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 5.6 Download Report PDF

Robotics; Embodied AI; Simulation; Sim2Real; Bimanual Manipulation; Synthetic Data Generation

Synthetic data generation via simulation represents a promising approach for enhancing robotic manipulation. However, current synthetic datasets remain insufficient for robust bimanual control due to limited scalability in novel task generation and oversimplified simulations that inadequately capture real-world complexity. We present RoboTwin 2.0, a scalable framework for automated diverse synthetic data generation and unified evaluation for bimanual manipulation. We construct RoboTwin-OD, an object library of 731 instances across 147 categories with semantic and manipulation labels. Building on this, we design a expert data generation pipeline by utilizing multimodal large language models to systhesize task-execution code with simulation-in-the-loop refinement. To improve sim-to-real transfer, RoboTwin 2.0 applies structured domain randomization over five factors (clutter, lighting, background, tabletop height, language instructions). Using this approach, we instantiate 50 bimanual tasks across five robot embodiments. Experimental results demonstrate a 10.9% improvement in code-generation success rates. For downstream learning, vision-language-action models trained with our synthetic data achieve 367% performance improvements in the few-shot setting and 228% improvements in the zero-shot setting, relative to a 10-demo real-only baseline. We further evaluate multiple policies across 50 tasks with two difficulty settings, establishing a comprehensive benchmark to study policy performance. We release the generator, datasets, and code to support scalable research in robust bimanual manipulation.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces RoboTwin 2.0, a framework for automated synthetic data generation and evaluation in bimanual manipulation. It resides in the 'Generative Digital Twin and Task Synthesis' leaf, which contains four papers including the original work. This leaf sits within the broader 'Synthetic Data Generation and Simulation Frameworks' branch, indicating a moderately populated research direction focused on automated task creation using generative models and LLMs. The taxonomy shows this is an active but not overcrowded area, with sibling papers like Robotwin and Robotwin Early exploring similar digital twin approaches for synthetic training data.

The taxonomy reveals neighboring research directions that contextualize this work. Adjacent leaves include 'Automated Demonstration Generation via Trajectory Augmentation' (three papers) and 'Data Augmentation Techniques' (two papers), which focus on augmenting existing demonstrations rather than generating novel tasks from scratch. The broader 'Learning and Control Approaches' branch (thirteen papers across four leaves) addresses downstream policy learning, while 'System Design and Benchmarking' (eleven papers) provides evaluation infrastructure. RoboTwin 2.0 bridges task synthesis with standardized benchmarking, connecting generative frameworks to evaluation platforms that assess sim-to-real transfer quality.

Among thirty candidates examined, each of the three contributions shows at least one refutable candidate. The MLLM-based expert data generation framework examined ten candidates with one potential overlap, suggesting some prior work in LLM-guided task synthesis exists but is not densely represented. The domain randomization strategy examined ten candidates with two refutable instances, indicating established precedent in structured randomization for sim-to-real transfer. The object library and benchmark contribution examined ten candidates with one overlap, suggesting that while object datasets exist, the specific combination of semantic labels, multi-embodiment support, and bimanual focus may offer differentiation within the limited search scope.

Based on the limited thirty-candidate search, the work appears to integrate established techniques—LLM-guided synthesis, domain randomization, object libraries—into a unified bimanual framework. The taxonomy position in a moderately populated leaf suggests incremental advancement within an active research direction rather than opening entirely new territory. The contribution-level statistics indicate partial overlap with prior work across all three claims, though the specific combination and bimanual focus may provide practical value. A more exhaustive literature search would be needed to definitively assess novelty beyond these top-K semantic matches.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: bimanual robotic manipulation with synthetic data generation. The field organizes around five main branches that reflect distinct but interconnected challenges. Synthetic Data Generation and Simulation Frameworks address how to create realistic training environments and task variations at scale, often leveraging digital twins and procedural generation techniques such as those in Robotwin[2] and Robotwin Early[3]. Learning and Control Approaches encompass methods ranging from reinforcement learning to imitation learning and diffusion-based policies, exemplified by works like RDT Diffusion[1]. Motion Planning and Coordination focuses on trajectory optimization, collision avoidance, and synchronization between dual arms. Grasp and Motion Synthesis Methods tackle the generation of feasible bimanual grasps and coordinated motions, with contributions such as Bimanual Grasp Synthesis[11] and Artigrasp[6]. Finally, System Design and Benchmarking provides hardware platforms, datasets, and evaluation protocols to ground these algorithmic advances in real-world performance. Within the synthetic data generation branch, a particularly active line of work explores generative digital twins and task synthesis, aiming to automatically produce diverse manipulation scenarios that transfer to physical systems. RoboTwin Two[0] sits squarely in this cluster, emphasizing scalable task generation for bimanual settings. It shares conceptual ground with Robotwin[2] and Robotwin Early[3], which similarly construct digital twin environments to synthesize training data, though RoboTwin Two[0] appears to push further on automating task diversity and bimanual coordination. Nearby efforts like Humanoidgen[7] and Momagen[8] also generate motion data but often target whole-body humanoid control rather than isolated dual-arm manipulation. A key open question across these works is how to balance procedural diversity with physical plausibility, ensuring that synthetically trained policies remain robust when deployed on real hardware.

Claimed Contributions

Automated expert data generation framework with MLLM and simulation-in-the-loop feedback

Can Refute

10 retrieved papers

The authors propose a closed-loop system that couples code generation with multimodal execution feedback. A code-generation agent translates natural language instructions into executable programs, while a vision-language model observer monitors execution in simulation, detects failures, and suggests corrections iteratively until success criteria are met.

10 retrieved papers

Can Refute

Systematic domain randomization strategy for policy robustness

Can Refute

10 retrieved papers

The framework applies comprehensive domain randomization across five dimensions: scene clutter with task-irrelevant distractors, diverse background textures from an 11,000-image library, lighting variations in color and intensity, tabletop height randomization, and trajectory-level diverse language instructions generated by MLLMs.

10 retrieved papers

Can Refute

RoboTwin-OD object library, multi-embodiment dataset, scalable generator, and standardized benchmark

Can Refute

10 retrieved papers

The authors introduce RoboTwin-OD comprising 731 annotated object instances across 147 categories with semantic and manipulation labels, a pre-collected dataset of over 100,000 expert trajectories spanning 50 bimanual tasks across five robot embodiments, and a comprehensive benchmark for evaluating policy generalization across tasks, scenes, and embodiments.

10 retrieved papers

Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[2] Robotwin: Dual-arm robot benchmark with generative digital twins PDF

Mu Yao, Chen Tian-xing, Lan ZhiQian, Gao, Zeyu, Liang Zhixuan, Yu, Qiaojun, Xu, Mingkun, Xie Zhi-qiang, Ding, Mingyu, Luo, Ping (2025)

[3] Robotwin: Dual-arm robot benchmark with generative digital twins (early version) PDF

Yao Mu, Tianxing Chen, Shijia Peng, Zanxin Chen, Zeyu Gao, Zhiqiang Xie, Ping Luo, Yude Zou, Lunkai Lin (2024)

[7] Humanoidgen: Data generation for bimanual dexterous manipulation via llm reasoning PDF

Zhi Jing, Yang Si-yuan, Xiao-Ting, Jiang Yu-gang, Bai, Chenjia (2025)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Automated expert data generation framework with MLLM and simulation-in-the-loop feedback

[62] Gensim2: Scaling robot data generation with multi-modal and reasoning llms PDF

Can Refute

[60] Vima: Robot manipulation with multimodal prompts PDF

Cannot Refute

[61] VIMA: General Robot Manipulation with Multimodal Prompts PDF

Cannot Refute

[63] Robodreamer: Learning compositional world models for robot imagination PDF

Cannot Refute

[64] Demostart: Demonstration-led auto-curriculum applied to sim-to-real with multi-fingered robots PDF

Cannot Refute

[65] Robocodex: Multimodal code generation for robotic behavior synthesis PDF

Cannot Refute

[66] Gen2Sim: Scaling up Robot Learning in Simulation with Generative Models PDF

Cannot Refute

[67] Instructvla: Vision-language-action instruction tuning from understanding to manipulation PDF

Cannot Refute

[68] Alpamayo-r1: Bridging reasoning and action prediction for generalizable autonomous driving in the long tail PDF

Cannot Refute

[69] Autonomous Catheterization with Open-source Simulator and Expert Trajectory PDF

Cannot Refute

Contribution

Systematic domain randomization strategy for policy robustness

[70] Domain randomization for transferring deep neural networks from simulation to the real world PDF

Can Refute

[72] Robust visual sim-to-real transfer for robotic manipulation PDF

Can Refute

[71] Domain randomization for sim2real transfer of automatically generated grasping datasets PDF

Cannot Refute

[73] Reconciling reality through simulation: A real-to-sim-to-real approach for robust manipulation PDF

Cannot Refute

[74] Flow-based Domain Randomization for Learning and Sequencing Robotic Skills PDF

Cannot Refute

[75] Sim-to-Real Reinforcement Learning for Deformable Object Manipulation PDF

Cannot Refute

[76] Understanding Domain Randomization for Sim-to-real Transfer PDF

Cannot Refute

[77] A survey on Sim-to-Real transfer methods for robotic manipulation PDF

Cannot Refute

[78] Active domain randomization PDF

Cannot Refute

[79] Bayessim: adaptive domain randomization via probabilistic inference for robotics simulators PDF

Cannot Refute

Contribution

RoboTwin-OD object library, multi-embodiment dataset, scalable generator, and standardized benchmark

[49] RoboMIND: Benchmark on Multi-embodiment Intelligence Normative Data for Robot Manipulation PDF

Can Refute

[51] Open X-Embodiment: Robotic Learning Datasets and RT-X Models : Open X-Embodiment Collaboration0 PDF

Cannot Refute

[52] Agibot world colosseo: A large-scale manipulation platform for scalable and intelligent embodied systems PDF

Cannot Refute

[53] Vlabench: A large-scale benchmark for language-conditioned robotics manipulation with long-horizon reasoning tasks PDF

Cannot Refute

[54] Open x-embodiment: Robotic learning datasets and rt-x models PDF

Cannot Refute

[55] The colosseum: A benchmark for evaluating generalization for robotic manipulation PDF

Cannot Refute

[56] Empowering Embodied Manipulation: A Bimanual-Mobile Robot Manipulation Dataset for Household Tasks PDF

Cannot Refute

[57] EMMOE: A Comprehensive Benchmark for Embodied Mobile Manipulation in Open Environments PDF

Cannot Refute

[58] RoboMIND 2.0: A Multimodal, Bimanual Mobile Manipulation Dataset for Generalizable Embodied Intelligence PDF

Cannot Refute

[59] Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation PDF

Cannot Refute

RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[2] Robotwin: Dual-arm robot benchmark with generative digital twins PDF

[3] Robotwin: Dual-arm robot benchmark with generative digital twins (early version) PDF

[7] Humanoidgen: Data generation for bimanual dexterous manipulation via llm reasoning PDF

Contribution Analysis

Automated expert data generation framework with MLLM and simulation-in-the-loop feedback

[62] Gensim2: Scaling robot data generation with multi-modal and reasoning llms PDF

[60] Vima: Robot manipulation with multimodal prompts PDF

[61] VIMA: General Robot Manipulation with Multimodal Prompts PDF

[63] Robodreamer: Learning compositional world models for robot imagination PDF

[64] Demostart: Demonstration-led auto-curriculum applied to sim-to-real with multi-fingered robots PDF

[65] Robocodex: Multimodal code generation for robotic behavior synthesis PDF

[66] Gen2Sim: Scaling up Robot Learning in Simulation with Generative Models PDF

[67] Instructvla: Vision-language-action instruction tuning from understanding to manipulation PDF

[68] Alpamayo-r1: Bridging reasoning and action prediction for generalizable autonomous driving in the long tail PDF

[69] Autonomous Catheterization with Open-source Simulator and Expert Trajectory PDF

Systematic domain randomization strategy for policy robustness

[70] Domain randomization for transferring deep neural networks from simulation to the real world PDF

[72] Robust visual sim-to-real transfer for robotic manipulation PDF

[71] Domain randomization for sim2real transfer of automatically generated grasping datasets PDF

[73] Reconciling reality through simulation: A real-to-sim-to-real approach for robust manipulation PDF

[74] Flow-based Domain Randomization for Learning and Sequencing Robotic Skills PDF

[75] Sim-to-Real Reinforcement Learning for Deformable Object Manipulation PDF

[76] Understanding Domain Randomization for Sim-to-real Transfer PDF

[77] A survey on Sim-to-Real transfer methods for robotic manipulation PDF

[78] Active domain randomization PDF

[79] Bayessim: adaptive domain randomization via probabilistic inference for robotics simulators PDF

RoboTwin-OD object library, multi-embodiment dataset, scalable generator, and standardized benchmark

[49] RoboMIND: Benchmark on Multi-embodiment Intelligence Normative Data for Robot Manipulation PDF

[51] Open X-Embodiment: Robotic Learning Datasets and RT-X Models : Open X-Embodiment Collaboration0 PDF

[52] Agibot world colosseo: A large-scale manipulation platform for scalable and intelligent embodied systems PDF

[53] Vlabench: A large-scale benchmark for language-conditioned robotics manipulation with long-horizon reasoning tasks PDF

[54] Open x-embodiment: Robotic learning datasets and rt-x models PDF

[55] The colosseum: A benchmark for evaluating generalization for robotic manipulation PDF

[56] Empowering Embodied Manipulation: A Bimanual-Mobile Robot Manipulation Dataset for Household Tasks PDF

[57] EMMOE: A Comprehensive Benchmark for Embodied Mobile Manipulation in Open Environments PDF

[58] RoboMIND 2.0: A Multimodal, Bimanual Mobile Manipulation Dataset for Generalizable Embodied Intelligence PDF

[59] Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation PDF

Table of Contents