Statistical Guarantees for Offline Domain Randomization

ICLR 2026 Conference SubmissionAnonymous Authors
Reinforcement LearningDomain RandomizationSim-To-Real
Abstract:

Reinforcement-learning agents often struggle when deployed from simulation to the real-world. A dominant strategy for reducing the sim-to-real gap is domain randomization (DR) which trains the policy across many simulators produced by sampling dynamics parameters, but standard DR ignores offline data already available from the real system. We study offline domain randomization (ODR), which first fits a distribution over simulator parameters to an offline dataset. While a growing body of empirical work reports substantial gains with algorithms such as DROPO, the theoretical foundations of ODR remain largely unexplored. In this work, we cast ODR as a maximum-likelihood estimation over a parametric simulator family and provide statistical guarantees: under mild regularity and identifiability conditions, the estimator is weakly consistent (it converges in probability to the true dynamics as data grows), and it becomes strongly consistent (i.e., it converges almost surely to the true dynamics) when an additional uniform Lipschitz continuity assumption holds. We examine the practicality of these assumptions and outline relaxations that justify ODR’s applicability across a broader range of settings. Taken together, our results place ODR on a principled footing and clarify when offline data can soundly guide the choice of a randomization distribution for downstream offline RL.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

This paper contributes formal statistical guarantees for offline domain randomization (ODR), establishing weak and strong consistency of maximum-likelihood estimators over simulator parameter families. It resides in the 'Statistical Consistency and Convergence Analysis' leaf, which contains only two papers total. This is a notably sparse research direction within the broader taxonomy of 49 papers, indicating that rigorous theoretical analysis of ODR remains underexplored despite growing empirical interest in methods like DROPO.

The taxonomy reveals that most work concentrates in algorithmic development and application domains rather than theoretical foundations. The sibling leaf 'Theoretical Understanding of Domain Randomization' contains one paper on general DR theory, while neighboring branches house six papers on offline DR algorithms and six on adaptive methods. The paper's theoretical focus contrasts sharply with the empirical emphasis of nearby algorithmic work, positioning it at the intersection of formal analysis and practical offline methods that leverage real-world data to inform simulator distributions.

Among 28 candidates examined, weak consistency (Contribution 1) encountered one potentially refutable prior work out of eight candidates reviewed, suggesting some overlap in establishing basic convergence properties. Strong consistency under uniform Lipschitz continuity (Contribution 2) and the relaxations/diagnostics framework (Contribution 3) each examined ten candidates with no clear refutations found. The limited search scope means these statistics reflect top-K semantic matches rather than exhaustive coverage, but the pattern suggests the strong consistency result and practical relaxations may represent more novel theoretical territory within the examined literature.

Given the sparse theoretical landscape and limited search scope of 28 candidates, the work appears to address a genuine gap in formal guarantees for offline domain randomization. The single sibling paper and absence of refutations for two of three contributions suggest novelty, though the analysis cannot rule out relevant prior work outside the top-K semantic neighborhood or in adjacent fields like system identification or statistical learning theory that may not surface in domain-specific searches.

Taxonomy

Core-task Taxonomy Papers
49
3
Claimed Contributions
28
Contribution Candidate Papers Compared
1
Refutable Paper

Research Landscape Overview

Core task: Offline domain randomization for sim-to-real transfer in reinforcement learning. The field addresses how to train robust policies in simulation that generalize to real-world deployment by randomizing simulator parameters, with a particular focus on offline settings where real-world data collection is expensive or risky. The taxonomy reveals four main branches: Theoretical Foundations and Guarantees explores statistical consistency and convergence properties of domain randomization methods, providing formal underpinnings for when and why randomization works; Algorithmic Approaches and Methods encompasses diverse techniques ranging from active selection of randomization parameters to entropy-based strategies and continual adaptation schemes; Application Domains demonstrates the breadth of real-world problems tackled, including robotics manipulation, autonomous navigation, medical procedures, and aerial systems; and Evaluation and Benchmarking focuses on metrics, experimental protocols, and comparative studies to assess transferability and validate simulation fidelity. Representative works span from foundational surveys like Sim-to-Real Survey[3] and Randomized Simulations Review[20] to practical implementations such as Blind Bipedal Stair[5] and Autonomous Blood Suction[10], illustrating the interplay between theory and application. Recent research has intensified around several contrasting themes: some works pursue provable guarantees and sample-efficient offline methods, while others emphasize adaptive or active randomization that refines distributions during training. A key tension exists between broad randomization for robustness versus targeted parameter selection for efficiency, as seen in comparisons between uniform approaches and methods like Active Domain Randomization[15] or Entropy Maximization DR[13]. Offline Domain Randomization[0] sits within the Theoretical Foundations branch alongside Provable Offline DR[2], both addressing statistical consistency and convergence when learning from fixed datasets without online environment interaction. Compared to DROPO[1], which also tackles offline settings, Offline Domain Randomization[0] emphasizes rigorous convergence analysis, while neighboring application-focused works like Dynamics Randomization[4] or Continual Domain Randomization[6] prioritize empirical performance across diverse tasks. This positioning highlights an ongoing challenge: bridging formal guarantees with practical deployment needs across the spectrum from theory to real-world systems.

Claimed Contributions

Weak consistency of the ODR estimator

The authors prove that the offline domain randomization estimator, formulated as maximum-likelihood estimation over a parametric simulator family, converges in probability to the true dynamics parameters as the offline dataset size increases, under regularity, positivity, and identifiability assumptions.

8 retrieved papers
Can Refute
Strong consistency of the ODR estimator under uniform Lipschitz continuity

By adding a uniform Lipschitz continuity assumption on the likelihood function, the authors upgrade the convergence guarantee from weak (in probability) to strong (almost sure) consistency, meaning the estimator converges to the true parameter with probability one.

10 retrieved papers
Relaxations and diagnostics for practical applicability of assumptions

The authors analyze when their theoretical assumptions hold in practice and provide relaxations such as replacing i.i.d. with stationarity and ergodicity, weakening mixture positivity via a logarithmic tail condition, and giving sufficient conditions for the uniform Lipschitz requirement, thereby broadening the applicability of their theoretical framework.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Weak consistency of the ODR estimator

The authors prove that the offline domain randomization estimator, formulated as maximum-likelihood estimation over a parametric simulator family, converges in probability to the true dynamics parameters as the offline dataset size increases, under regularity, positivity, and identifiability assumptions.

Contribution

Strong consistency of the ODR estimator under uniform Lipschitz continuity

By adding a uniform Lipschitz continuity assumption on the likelihood function, the authors upgrade the convergence guarantee from weak (in probability) to strong (almost sure) consistency, meaning the estimator converges to the true parameter with probability one.

Contribution

Relaxations and diagnostics for practical applicability of assumptions

The authors analyze when their theoretical assumptions hold in practice and provide relaxations such as replacing i.i.d. with stationarity and ergodicity, weakening mixture positivity via a logarithmic tail condition, and giving sufficient conditions for the uniform Lipschitz requirement, thereby broadening the applicability of their theoretical framework.

Statistical Guarantees for Offline Domain Randomization | Novelty Validation