D-REX: Differentiable Real-to-Sim-to-Real Engine for Learning Dexterous Grasping

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 5.5 Download Report PDF

Real-to-Sim-to-Real; Differentiable Simulation; Learning Robotic Policies from Videos; System Identification;

Simulation provides a cost-effective and flexible platform for data generation and policy learning to develop robotic systems. However, bridging the gap between simulation and real-world dynamics remains a significant challenge, especially in physical parameter identification. In this work, we introduce a real-to-sim-to-real engine that leverages the Gaussian Splat representations to build a differentiable engine, enabling object mass identification from real-world visual observations and robot control signals, while enabling grasping policy learning simultaneously. Through optimizing the mass of the manipulated object, our method automatically builds high-fidelity and physically plausible digital twins. Additionally, we propose a novel approach to train force-aware grasping policies from limited data by transferring feasible human demonstrations into simulated robot demonstrations. Through comprehensive experiments, we demonstrate that our engine achieves accurate and robust performance in mass identification across various object geometries and mass values. Those optimized mass values facilitate force-aware policy learning, achieving superior and high performance in object grasping, effectively reducing the sim-to-real gap. Our code is included in the Supplementary Material and will be open source to facilitate reproducibility. Anonymous project page is available at https://robot-drex-engine.github.io.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes a differentiable real-to-sim-to-real framework that identifies object mass from visual observations and robot control signals while simultaneously learning force-aware grasping policies. It resides in the Force-Aware and Compliant Manipulation leaf under Multi-Modal Sensing and Fusion. Notably, this leaf contains only one paper in the taxonomy (the original submission itself), indicating a relatively sparse research direction within the broader field of fifty surveyed works. This positioning suggests the work addresses a niche intersection of physical parameter identification and force-aware policy learning.

The taxonomy reveals that neighboring leaves focus on Tactile-Visual Integration (three papers) and broader Vision-Based Deep Reinforcement Learning branches (multiple subtopics with two to four papers each). The scope note for Force-Aware and Compliant Manipulation explicitly includes force control and compliance for adaptive grasping, excluding purely visual or tactile methods. The paper's differentiable simulation approach connects to the Sim-to-Real Policy Transfer leaf (one paper) and contrasts with purely vision-driven methods in Closed-Loop Vision-Based Control (three papers). This structural context highlights that force-aware manipulation remains less explored than tactile-vision fusion or standard visual reinforcement learning.

Among twenty-nine candidates examined, the contribution-level statistics reveal varying degrees of prior overlap. The differentiable real-to-sim-to-real framework examined ten candidates with three appearing to provide overlapping prior work. Force-aware policy learning from human demonstrations examined ten candidates with one refutable match. End-to-end mass identification through differentiable simulation examined nine candidates with five showing potential overlap. These numbers indicate that within the limited search scope, several existing works address related parameter identification or force-aware learning problems, though the specific combination of Gaussian Splat representations and simultaneous mass identification with policy learning may offer a distinct integration.

Based on the top-thirty semantic matches examined, the work appears to occupy a moderately explored niche. The taxonomy structure confirms that force-aware manipulation is less crowded than tactile-vision fusion or standard visual reinforcement learning. However, the contribution-level statistics suggest that individual technical components (differentiable simulation, mass identification, force-aware policies) have precedents in the examined literature. The analysis does not cover exhaustive domain-specific venues or recent preprints beyond the candidate set, leaving open questions about incremental versus transformative novelty.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: learning dexterous grasping from visual observations and robot control signals. The field organizes around several complementary branches that reflect different methodological emphases and problem settings. Vision-Based Deep Reinforcement Learning and Simulation-to-Reality Transfer focus on end-to-end policy learning, often leveraging large-scale synthetic data and domain randomization to bridge the sim-to-real gap, as seen in works like QT-Opt[4] and Scalable Vision Manipulation[3]. Learning from Human Demonstrations and Priors emphasizes imitation and teleoperation to bootstrap policies efficiently, while Multi-Modal Sensing and Fusion integrates tactile, force, and proprioceptive signals alongside vision to enable compliant and adaptive manipulation. Vision-Language-Action Models and Foundational Methods explore how pre-trained representations and language grounding can generalize across tasks, with recent efforts like Dexterous Arma-Hand VLA[19] and RoboDexVLM[27] pushing toward unified architectures. Meanwhile, Specialized Dexterous Manipulation Tasks and Application-Specific branches address domain constraints in areas such as assembly, deformable object handling, and assistive robotics, and Emerging Paradigms investigate active perception and next-generation sensing modalities. Within this landscape, a particularly active line of work centers on fusing multiple sensory modalities to achieve robust, contact-rich manipulation. D-REX[0] sits squarely in the Multi-Modal Sensing and Fusion branch under Force-Aware and Compliant Manipulation, emphasizing the integration of visual feedback with force or tactile cues to handle delicate grasping scenarios. This contrasts with purely vision-driven approaches like Vision Deep RL Grasping[10] or Simulated Depth Grasping[42], which rely on depth or RGB alone, and complements recent tactile-vision fusion methods such as ViTacFormer[31] and See to Touch[38]. Compared to Compliant Vision Demonstration[21], which also targets compliant control but leans on human demonstrations, D-REX[0] explores how force-aware policies can be learned more autonomously from multi-modal observations. The central trade-off across these branches remains balancing sensor complexity, data efficiency, and generalization: while richer modalities promise finer control, they also raise challenges in sensor calibration, sim-to-real transfer, and scalable data collection.

Claimed Contributions

Differentiable real-to-sim-to-real framework for object mass identification

Can Refute

10 retrieved papers

The authors propose a framework that combines Gaussian Splat representations with differentiable physics simulation to identify object mass from visual observations and robot control signals. This enables automatic construction of high-fidelity, physically plausible digital twins through end-to-end optimization.

10 retrieved papers

Can Refute

Force-aware grasping policy learning from human demonstrations

Can Refute

10 retrieved papers

The authors introduce a method that transfers human demonstrations into robot-executable trajectories in simulation and trains policies that combine position and force control conditioned on identified object mass. This hybrid control approach enables robust grasping across varying object masses.

10 retrieved papers

Can Refute

End-to-end mass identification through differentiable simulation

Can Refute

9 retrieved papers

The framework leverages differentiable physics engines to optimize object mass by minimizing trajectory discrepancies between simulation and real-world robot-object interactions. Unlike prior methods requiring manually specified forces, this approach uses consistent robotic control signals for end-to-end optimization.

9 retrieved papers

Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Differentiable real-to-sim-to-real framework for object mass identification

[51] gradsim: Differentiable simulation for system identification and visuomotor control PDF

Can Refute

[52] Differentiable Physics Simulation of Dynamics-Augmented Neural Objects PDF

Can Refute

[57] Learning Object Properties Using Robot Proprioception via Differentiable Robot-Object Interaction PDF

Can Refute

[53] Differentiable Simulation for Physical System Identification PDF

Cannot Refute

[54] Visual Interaction Networks: Learning a Physics Simulator from Video PDF

Cannot Refute

[55] Predictive Visuo-Tactile Interactive Perception Framework for Object Properties Inference PDF

Cannot Refute

[56] A compositional object-based approach to learning physical dynamics PDF

Cannot Refute

[58] Dual-energy CT based mass density and relative stopping power estimation for proton therapy using physics-informed deep learning PDF

Cannot Refute

[59] Learning particle physics by example: location-aware generative adversarial networks for physics synthesis PDF

Cannot Refute

[60] Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language PDF

Cannot Refute

Contribution

Force-aware grasping policy learning from human demonstrations

[69] DREAM: Differentiable Real-to-Sim-to-Real Engine for Learning Robotic Manipulation PDF

Can Refute

[14] Learning adaptive grasping from human demonstrations PDF

Cannot Refute

[61] Tactile-VLA: unlocking vision-language-action model's physical knowledge for tactile generalization PDF

Cannot Refute

[62] Physically based grasping control from example PDF

Cannot Refute

[63] Task-grasping from a demonstrated human strategy PDF

Cannot Refute

[64] Learning to grasp under uncertainty using POMDPs PDF

Cannot Refute

[65] Efficient force control learning system for industrial robots based on variable impedance control PDF

Cannot Refute

[66] Flow with the Force Field: Learning 3D Compliant Flow Matching Policies from Force and Demonstration-Guided Simulation Data PDF

Cannot Refute

[67] Learning-from-Observation 2.0: Automatic Acquisition of Robot Behavior from Human Demonstration PDF

Cannot Refute

[68] Few-shot Sim2Real Based on High Fidelity Rendering with Force Feedback Teleoperation PDF

Cannot Refute

Contribution

End-to-end mass identification through differentiable simulation

[52] Differentiable Physics Simulation of Dynamics-Augmented Neural Objects PDF

Can Refute

[69] DREAM: Differentiable Real-to-Sim-to-Real Engine for Learning Robotic Manipulation PDF

Can Refute

[70] Learning to slide unknown objects with differentiable physics simulations PDF

Can Refute

[71] Identifying mechanical models of unknown objects with differentiable physics simulations PDF

Can Refute

[74] Interactive Differentiable Simulation PDF

Can Refute

[57] Learning Object Properties Using Robot Proprioception via Differentiable Robot-Object Interaction PDF

Cannot Refute

[72] An End-to-End Framework for Modelling Pneumatic Soft Robots Based on Differentiable Finite Element Methods PDF

Cannot Refute

[73] Differentiable Physics-based System Identification for Robotic Manipulation of Elastoplastic Materials PDF

Cannot Refute

[75] Synthetic dataset for human robot collaborative transportation of deformable objects PDF

Cannot Refute

D-REX: Differentiable Real-to-Sim-to-Real Engine for Learning Dexterous Grasping

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

Contribution Analysis

Differentiable real-to-sim-to-real framework for object mass identification

[51] gradsim: Differentiable simulation for system identification and visuomotor control PDF

[52] Differentiable Physics Simulation of Dynamics-Augmented Neural Objects PDF

[57] Learning Object Properties Using Robot Proprioception via Differentiable Robot-Object Interaction PDF

[53] Differentiable Simulation for Physical System Identification PDF

[54] Visual Interaction Networks: Learning a Physics Simulator from Video PDF

[55] Predictive Visuo-Tactile Interactive Perception Framework for Object Properties Inference PDF

[56] A compositional object-based approach to learning physical dynamics PDF

[58] Dual-energy CT based mass density and relative stopping power estimation for proton therapy using physics-informed deep learning PDF

[59] Learning particle physics by example: location-aware generative adversarial networks for physics synthesis PDF

[60] Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language PDF

Force-aware grasping policy learning from human demonstrations

[69] DREAM: Differentiable Real-to-Sim-to-Real Engine for Learning Robotic Manipulation PDF

[14] Learning adaptive grasping from human demonstrations PDF

[61] Tactile-VLA: unlocking vision-language-action model's physical knowledge for tactile generalization PDF

[62] Physically based grasping control from example PDF

[63] Task-grasping from a demonstrated human strategy PDF

[64] Learning to grasp under uncertainty using POMDPs PDF

[65] Efficient force control learning system for industrial robots based on variable impedance control PDF

[66] Flow with the Force Field: Learning 3D Compliant Flow Matching Policies from Force and Demonstration-Guided Simulation Data PDF

[67] Learning-from-Observation 2.0: Automatic Acquisition of Robot Behavior from Human Demonstration PDF

[68] Few-shot Sim2Real Based on High Fidelity Rendering with Force Feedback Teleoperation PDF

End-to-end mass identification through differentiable simulation

[52] Differentiable Physics Simulation of Dynamics-Augmented Neural Objects PDF

[69] DREAM: Differentiable Real-to-Sim-to-Real Engine for Learning Robotic Manipulation PDF

[70] Learning to slide unknown objects with differentiable physics simulations PDF

[71] Identifying mechanical models of unknown objects with differentiable physics simulations PDF

[74] Interactive Differentiable Simulation PDF

[57] Learning Object Properties Using Robot Proprioception via Differentiable Robot-Object Interaction PDF

[72] An End-to-End Framework for Modelling Pneumatic Soft Robots Based on Differentiable Finite Element Methods PDF

[73] Differentiable Physics-based System Identification for Robotic Manipulation of Elastoplastic Materials PDF

[75] Synthetic dataset for human robot collaborative transportation of deformable objects PDF

Table of Contents