RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation

ICLR 2026 Conference SubmissionAnonymous Authors
Robotics; Embodied AI; Simulation; Sim2Real; Bimanual Manipulation; Synthetic Data Generation
Abstract:

Synthetic data generation via simulation represents a promising approach for enhancing robotic manipulation. However, current synthetic datasets remain insufficient for robust bimanual control due to limited scalability in novel task generation and oversimplified simulations that inadequately capture real-world complexity. We present RoboTwin 2.0, a scalable framework for automated diverse synthetic data generation and unified evaluation for bimanual manipulation. We construct RoboTwin-OD, an object library of 731 instances across 147 categories with semantic and manipulation labels. Building on this, we design a expert data generation pipeline by utilizing multimodal large language models to systhesize task-execution code with simulation-in-the-loop refinement. To improve sim-to-real transfer, RoboTwin 2.0 applies structured domain randomization over five factors (clutter, lighting, background, tabletop height, language instructions). Using this approach, we instantiate 50 bimanual tasks across five robot embodiments. Experimental results demonstrate a 10.9% improvement in code-generation success rates. For downstream learning, vision-language-action models trained with our synthetic data achieve 367% performance improvements in the few-shot setting and 228% improvements in the zero-shot setting, relative to a 10-demo real-only baseline. We further evaluate multiple policies across 50 tasks with two difficulty settings, establishing a comprehensive benchmark to study policy performance. We release the generator, datasets, and code to support scalable research in robust bimanual manipulation.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces RoboTwin 2.0, a framework for automated synthetic data generation and evaluation in bimanual manipulation. It resides in the 'Generative Digital Twin and Task Synthesis' leaf, which contains four papers including the original work. This leaf sits within the broader 'Synthetic Data Generation and Simulation Frameworks' branch, indicating a moderately populated research direction focused on automated task creation using generative models and LLMs. The taxonomy shows this is an active but not overcrowded area, with sibling papers like Robotwin and Robotwin Early exploring similar digital twin approaches for synthetic training data.

The taxonomy reveals neighboring research directions that contextualize this work. Adjacent leaves include 'Automated Demonstration Generation via Trajectory Augmentation' (three papers) and 'Data Augmentation Techniques' (two papers), which focus on augmenting existing demonstrations rather than generating novel tasks from scratch. The broader 'Learning and Control Approaches' branch (thirteen papers across four leaves) addresses downstream policy learning, while 'System Design and Benchmarking' (eleven papers) provides evaluation infrastructure. RoboTwin 2.0 bridges task synthesis with standardized benchmarking, connecting generative frameworks to evaluation platforms that assess sim-to-real transfer quality.

Among thirty candidates examined, each of the three contributions shows at least one refutable candidate. The MLLM-based expert data generation framework examined ten candidates with one potential overlap, suggesting some prior work in LLM-guided task synthesis exists but is not densely represented. The domain randomization strategy examined ten candidates with two refutable instances, indicating established precedent in structured randomization for sim-to-real transfer. The object library and benchmark contribution examined ten candidates with one overlap, suggesting that while object datasets exist, the specific combination of semantic labels, multi-embodiment support, and bimanual focus may offer differentiation within the limited search scope.

Based on the limited thirty-candidate search, the work appears to integrate established techniques—LLM-guided synthesis, domain randomization, object libraries—into a unified bimanual framework. The taxonomy position in a moderately populated leaf suggests incremental advancement within an active research direction rather than opening entirely new territory. The contribution-level statistics indicate partial overlap with prior work across all three claims, though the specific combination and bimanual focus may provide practical value. A more exhaustive literature search would be needed to definitively assess novelty beyond these top-K semantic matches.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
30
Contribution Candidate Papers Compared
4
Refutable Paper

Research Landscape Overview

Core task: bimanual robotic manipulation with synthetic data generation. The field organizes around five main branches that reflect distinct but interconnected challenges. Synthetic Data Generation and Simulation Frameworks address how to create realistic training environments and task variations at scale, often leveraging digital twins and procedural generation techniques such as those in Robotwin[2] and Robotwin Early[3]. Learning and Control Approaches encompass methods ranging from reinforcement learning to imitation learning and diffusion-based policies, exemplified by works like RDT Diffusion[1]. Motion Planning and Coordination focuses on trajectory optimization, collision avoidance, and synchronization between dual arms. Grasp and Motion Synthesis Methods tackle the generation of feasible bimanual grasps and coordinated motions, with contributions such as Bimanual Grasp Synthesis[11] and Artigrasp[6]. Finally, System Design and Benchmarking provides hardware platforms, datasets, and evaluation protocols to ground these algorithmic advances in real-world performance. Within the synthetic data generation branch, a particularly active line of work explores generative digital twins and task synthesis, aiming to automatically produce diverse manipulation scenarios that transfer to physical systems. RoboTwin Two[0] sits squarely in this cluster, emphasizing scalable task generation for bimanual settings. It shares conceptual ground with Robotwin[2] and Robotwin Early[3], which similarly construct digital twin environments to synthesize training data, though RoboTwin Two[0] appears to push further on automating task diversity and bimanual coordination. Nearby efforts like Humanoidgen[7] and Momagen[8] also generate motion data but often target whole-body humanoid control rather than isolated dual-arm manipulation. A key open question across these works is how to balance procedural diversity with physical plausibility, ensuring that synthetically trained policies remain robust when deployed on real hardware.

Claimed Contributions

Automated expert data generation framework with MLLM and simulation-in-the-loop feedback

The authors propose a closed-loop system that couples code generation with multimodal execution feedback. A code-generation agent translates natural language instructions into executable programs, while a vision-language model observer monitors execution in simulation, detects failures, and suggests corrections iteratively until success criteria are met.

10 retrieved papers
Can Refute
Systematic domain randomization strategy for policy robustness

The framework applies comprehensive domain randomization across five dimensions: scene clutter with task-irrelevant distractors, diverse background textures from an 11,000-image library, lighting variations in color and intensity, tabletop height randomization, and trajectory-level diverse language instructions generated by MLLMs.

10 retrieved papers
Can Refute
RoboTwin-OD object library, multi-embodiment dataset, scalable generator, and standardized benchmark

The authors introduce RoboTwin-OD comprising 731 annotated object instances across 147 categories with semantic and manipulation labels, a pre-collected dataset of over 100,000 expert trajectories spanning 50 bimanual tasks across five robot embodiments, and a comprehensive benchmark for evaluating policy generalization across tasks, scenes, and embodiments.

10 retrieved papers
Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Automated expert data generation framework with MLLM and simulation-in-the-loop feedback

The authors propose a closed-loop system that couples code generation with multimodal execution feedback. A code-generation agent translates natural language instructions into executable programs, while a vision-language model observer monitors execution in simulation, detects failures, and suggests corrections iteratively until success criteria are met.

Contribution

Systematic domain randomization strategy for policy robustness

The framework applies comprehensive domain randomization across five dimensions: scene clutter with task-irrelevant distractors, diverse background textures from an 11,000-image library, lighting variations in color and intensity, tabletop height randomization, and trajectory-level diverse language instructions generated by MLLMs.

Contribution

RoboTwin-OD object library, multi-embodiment dataset, scalable generator, and standardized benchmark

The authors introduce RoboTwin-OD comprising 731 annotated object instances across 147 categories with semantic and manipulation labels, a pre-collected dataset of over 100,000 expert trajectories spanning 50 bimanual tasks across five robot embodiments, and a comprehensive benchmark for evaluating policy generalization across tasks, scenes, and embodiments.

RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation | Novelty Validation