RealBench: A Benchmark for Complex Physical Systems with Real-World Data

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 7.5 Download Report PDF

complex physical systemPDEbenchmarkreal-world dataprediction

Predicting the evolution of complex physical systems remains a central problem in science and engineering. Despite rapid progress in scientific Machine Learning (ML) models, a critical bottleneck is the lack of expensive real-world data, resulting in most current models being trained and validated on simulated data. Beyond limiting the development and evaluation of scientific ML, this gap also hinders research into essential tasks such as sim-to-real transfer. We introduce RealPDEBench, the first benchmark for scientific ML that integrates real-world measurements with paired numerical simulations. RealPDEBench consists of five datasets, three tasks, eight metrics, and ten baselines. We first present five real-world measured datasets with paired simulated datasets across different complex physical systems. We further define three tasks, which allow comparisons between real-world and simulated data, and facilitate the development of methods to bridge the two. Moreover, we design eight evaluation metrics, spanning data-oriented and physics-oriented metrics, and finally benchmark ten representative baselines, including state-of-the-art models, pretrained PDE foundation models, and a traditional method. Experiments reveal significant discrepancies between simulated and real-world data, while showing that pretraining with simulated data consistently improves both accuracy and convergence. In this work, we hope to provide insights from real-world data, advancing scientific ML toward bridging the sim-to-real gap and real-world deployment.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces RealPDEBench, a benchmark integrating real-world measurements with paired numerical simulations for scientific machine learning. It resides in the 'Physics-Informed Neural Networks and Uncertainty Quantification' leaf, which contains only three papers total. This leaf sits within the broader 'Physics-Informed and Hybrid Modeling Approaches' branch, indicating a relatively sparse research direction compared to the more crowded robotic control branches (15 papers across three leaves). The focus on benchmark infrastructure for PDE prediction distinguishes it from the sibling papers, which emphasize calibration methods and graph-based physics engines.

The taxonomy reveals that neighboring leaves address hybrid transfer learning with physics priors (1 paper) and model-based reinforcement learning (3 papers), both emphasizing policy learning rather than benchmark construction. The broader field structure shows that most sim-to-real work concentrates on robotic control (15 papers) and digital twin monitoring (13 papers), with physics-informed modeling receiving less attention (5 papers total). RealPDEBench diverges from these directions by targeting scientific ML evaluation infrastructure rather than control policies or industrial monitoring, occupying a niche at the intersection of data-driven learning and physics-based simulation validation.

Among 30 candidates examined, the contribution-level analysis shows varied novelty profiles. The paired real-world and simulated dataset contribution (10 candidates examined, 0 refutable) appears most distinctive, as no prior work provides this specific benchmark infrastructure. The three task categories (10 candidates, 0 refutable) also show no direct overlap. However, the comprehensive evaluation framework (10 candidates, 1 refutable) encounters at least one candidate offering overlapping metrics or evaluation approaches. Given the limited search scope, these statistics suggest the benchmark infrastructure itself is relatively novel, while the evaluation methodology has more substantial prior work within the examined candidates.

Based on the top-30 semantic matches and taxonomy structure, the work addresses a sparse research direction with limited direct competition in its specific leaf. The benchmark contribution appears more novel than the evaluation framework, though the restricted search scope means additional relevant work may exist beyond the candidates examined. The taxonomy context suggests this represents a meaningful but incremental step in a less-explored corner of the broader sim-to-real transfer landscape.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: bridging the sim-to-real gap in complex physical system prediction. The field addresses the challenge of ensuring that models trained or validated in simulation can reliably predict real-world behavior across diverse physical systems. The taxonomy reveals five main branches: Sim-to-Real Transfer Methods for Robotic Control focuses on domain randomization, policy adaptation, and reinforcement learning techniques that enable robots to generalize from synthetic to physical environments (e.g., Dynamics Randomization[5], Closing Sim-to-Real Loop[7]). Surveys and Frameworks provide conceptual overviews and methodological guidance (Sim-to-Real Survey[3]). Autonomous Systems and Embodied AI emphasize end-to-end learning and perception-action loops in agents operating in real environments (Embodied AI Survey[2]). Digital Twins and Virtual Monitoring Systems create persistent virtual replicas for industrial assets, infrastructure, and energy systems (Digital Twin Turbines[11], Multi Digital Twin[14]). Physics-Informed and Hybrid Modeling Approaches integrate domain knowledge, neural networks, and uncertainty quantification to improve predictive fidelity and calibration (Calibrated Physics Informed[15], Graph Physics Engines[45]). A particularly active line of work explores how to blend data-driven flexibility with physical constraints, trading off model expressiveness against interpretability and sample efficiency. Another contrasting theme is whether to adapt simulators to match reality through system identification and calibration, or to learn robust policies that tolerate discrepancies via randomization and domain adaptation. RealBench[0] sits within the Physics-Informed and Hybrid Modeling branch, specifically addressing physics-informed neural networks and uncertainty quantification. It shares methodological kinship with Calibrated Physics Informed[15], which also emphasizes calibration and uncertainty-aware prediction, and with Graph Physics Engines[45], which leverages structured representations of physical interactions. Where some works prioritize pure learning or pure physics, RealBench[0] occupies a middle ground by systematically benchmarking how well hybrid approaches can close the sim-to-real gap when physical priors and neural flexibility are combined with rigorous uncertainty estimates.

Claimed Contributions

RealPDEBench benchmark with paired real-world and simulated data

10 retrieved papers

The authors present RealPDEBench, the first scientific ML benchmark that systematically pairs real-world experimental measurements with numerical simulations across five complex physical systems. This benchmark includes more than 700 trajectories covering fluid dynamics and combustion scenarios, enabling systematic evaluation of models on real-world data and investigation of the sim-to-real gap.

10 retrieved papers

Three task categories for comparing real-world and simulated data

10 retrieved papers

The authors define three training paradigms: training on simulated data, training on real-world data, and pretraining on simulated data followed by finetuning on real-world data. These tasks enable systematic comparison of the strengths and limitations of both data types and provide a foundation for developing methods that effectively combine them.

10 retrieved papers

Comprehensive evaluation framework with data-oriented and physics-oriented metrics

Can Refute

10 retrieved papers

The authors introduce a comprehensive evaluation framework consisting of eight metrics that assess model performance from both data-oriented perspectives (such as RMSE and MAE) and physics-oriented perspectives (such as Fourier Space Error and Kinetic Energy Error). They benchmark ten representative baselines including state-of-the-art models and pretrained foundation models using this framework.

10 retrieved papers

Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[15] Calibrated Physics-Informed Uncertainty Quantification PDF

Gopakumar, Vignesh, Gray, Ander, Zanisi, Lorenzo, Giles, Daniel, Kusner, Matt J., Pamela, Stanislas, Deisenroth Marc Peter (2025) • International Conference on Machine Learning

[45] Graph networks as learnable physics engines for inference and control PDF

Ãlvaro SÃ¡nchezâGonzÃ¡lez, Nicolas Heess, Jost Tobias Springenberg, Josh Merel, Martin Riedmiller, Raia Hadsell, Peter Battaglia (2018)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

RealPDEBench benchmark with paired real-world and simulated data

[61] Filtered partial differential equations: a robust surrogate constraint in physics-informed deep learning framework PDF

Cannot Refute

[62] Computational, Data-Driven, and Physics-Informed Machine Learning Approaches for Microstructure Modeling in Metal Additive Manufacturing PDF

Cannot Refute

[63] Physics-informed deep-learning applications to experimental fluid mechanics PDF

Cannot Refute

[64] Bulk Low-Inertia Power Systems Adaptive Fault Type Classification Method Based on Machine Learning and Phasor Measurement Units Data PDF

Cannot Refute

[65] Predicting fusion ignition at the National Ignition Facility with physics-informed deep learning. PDF

Cannot Refute

[66] Morpheus: Benchmarking Physical Reasoning of Video Generative Models with Real Physical Experiments PDF

Cannot Refute

[67] On the prediction of critical heat flux using a physics-informed machine learning-aided framework PDF

Cannot Refute

[68] Real-time Fusion of Multi-Source Monitoring Data with Geotechnical Numerical Model Results using Data-driven and Physics-informed Sparse Dictionary Learning PDF

Cannot Refute

[69] Scientific Machine Learning (SciML) - How the Fusion of AI and Physics is Giving Rise to Promising Simulation Methodologies PDF

Cannot Refute

[70] Evaluating Universal Machine Learning Force Fields Against Experimental Measurements PDF

Cannot Refute

Contribution

Three task categories for comparing real-world and simulated data

[71] Interpretable machine learning for science with PySR and SymbolicRegression. jl PDF

Cannot Refute

[72] Combining machine learning and simulation to a hybrid modelling approach: Current and future directions PDF

Cannot Refute

[73] Domain-adaptive neural networks improve supervised machine learning based on simulated population genetic data PDF

Cannot Refute

[74] Next-generation deep learning based on simulators and synthetic data PDF

Cannot Refute

[75] From realâworld patient data to individualized treatment effects using machine learning: current and future methods to address underlying challenges PDF

Cannot Refute

[76] MLReal: Bridging the gap between training on synthetic data and real data applications in machine learning PDF

Cannot Refute

[77] Challenges of real-world reinforcement learning: definitions, benchmarks and analysis PDF

Cannot Refute

[78] Merging physics-based synthetic data and machine learning for thermal monitoring of lithium-ion batteries: the role of data fidelity PDF

Cannot Refute

[79] Physics informed synthetic image generation for deep learning-based detection of wrinkles and folds PDF

Cannot Refute

[80] Transfer-learning: Bridging the gap between real and simulation data for machine learning in injection molding PDF

Cannot Refute

Contribution

Comprehensive evaluation framework with data-oriented and physics-oriented metrics

[55] Physics-informed machine learning for advancing computational medical imaging: integrating data-driven approaches with fundamental physical principles PDF

Can Refute

[51] Integrating data-driven and physics-based approaches for robust wind power prediction: A comprehensive ML-PINN-Simulink framework PDF

Cannot Refute

[52] Global ionospheric sporadic intensity prediction from GNSS RO using a novel stacking machine learning method incorporated with physical observations PDF

Cannot Refute

[53] Dataâdriven physicsâbased digital twins via a library of componentâbased reducedâorder models PDF

Cannot Refute

[54] Data-driven, physics-based, or both: Fatigue prediction of structural adhesive joints by artificial intelligence PDF

Cannot Refute

[56] Physics-based vs. data-driven 24-hour probabilistic forecasts of precipitation for northern tropical Africa PDF

Cannot Refute

[57] Driven by data or derived through physics? a review of hybrid physics guided machine learning techniques with cyber-physical system (cps) focus PDF

Cannot Refute

[58] A comparative study on methods for fusing data-driven and physics-based models for hybrid remaining useful life prediction of air filters PDF

Cannot Refute

[59] Joint physics-based and data-driven time-lapse seismic inversion: Mitigating data scarcity PDF

Cannot Refute

[60] Improving Typhoon Predictions by Integrating DataâDriven Machine Learning Model With Physics Model Based on the Spectral Nudging and Data Assimilation PDF

Cannot Refute

RealBench: A Benchmark for Complex Physical Systems with Real-World Data

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[15] Calibrated Physics-Informed Uncertainty Quantification PDF

[45] Graph networks as learnable physics engines for inference and control PDF

Contribution Analysis

RealPDEBench benchmark with paired real-world and simulated data

[61] Filtered partial differential equations: a robust surrogate constraint in physics-informed deep learning framework PDF

[62] Computational, Data-Driven, and Physics-Informed Machine Learning Approaches for Microstructure Modeling in Metal Additive Manufacturing PDF

[63] Physics-informed deep-learning applications to experimental fluid mechanics PDF

[64] Bulk Low-Inertia Power Systems Adaptive Fault Type Classification Method Based on Machine Learning and Phasor Measurement Units Data PDF

[65] Predicting fusion ignition at the National Ignition Facility with physics-informed deep learning. PDF

[66] Morpheus: Benchmarking Physical Reasoning of Video Generative Models with Real Physical Experiments PDF

[67] On the prediction of critical heat flux using a physics-informed machine learning-aided framework PDF

[68] Real-time Fusion of Multi-Source Monitoring Data with Geotechnical Numerical Model Results using Data-driven and Physics-informed Sparse Dictionary Learning PDF

[69] Scientific Machine Learning (SciML) - How the Fusion of AI and Physics is Giving Rise to Promising Simulation Methodologies PDF

[70] Evaluating Universal Machine Learning Force Fields Against Experimental Measurements PDF

Three task categories for comparing real-world and simulated data

[71] Interpretable machine learning for science with PySR and SymbolicRegression. jl PDF

[72] Combining machine learning and simulation to a hybrid modelling approach: Current and future directions PDF

[73] Domain-adaptive neural networks improve supervised machine learning based on simulated population genetic data PDF

[74] Next-generation deep learning based on simulators and synthetic data PDF

[75] From realâworld patient data to individualized treatment effects using machine learning: current and future methods to address underlying challenges PDF

[76] MLReal: Bridging the gap between training on synthetic data and real data applications in machine learning PDF

[77] Challenges of real-world reinforcement learning: definitions, benchmarks and analysis PDF

[78] Merging physics-based synthetic data and machine learning for thermal monitoring of lithium-ion batteries: the role of data fidelity PDF

[79] Physics informed synthetic image generation for deep learning-based detection of wrinkles and folds PDF

[80] Transfer-learning: Bridging the gap between real and simulation data for machine learning in injection molding PDF

Comprehensive evaluation framework with data-oriented and physics-oriented metrics

[55] Physics-informed machine learning for advancing computational medical imaging: integrating data-driven approaches with fundamental physical principles PDF

[51] Integrating data-driven and physics-based approaches for robust wind power prediction: A comprehensive ML-PINN-Simulink framework PDF

[52] Global ionospheric sporadic intensity prediction from GNSS RO using a novel stacking machine learning method incorporated with physical observations PDF

[53] Dataâdriven physicsâbased digital twins via a library of componentâbased reducedâorder models PDF

[54] Data-driven, physics-based, or both: Fatigue prediction of structural adhesive joints by artificial intelligence PDF

[56] Physics-based vs. data-driven 24-hour probabilistic forecasts of precipitation for northern tropical Africa PDF

[57] Driven by data or derived through physics? a review of hybrid physics guided machine learning techniques with cyber-physical system (cps) focus PDF

[58] A comparative study on methods for fusing data-driven and physics-based models for hybrid remaining useful life prediction of air filters PDF

[59] Joint physics-based and data-driven time-lapse seismic inversion: Mitigating data scarcity PDF

[60] Improving Typhoon Predictions by Integrating DataâDriven Machine Learning Model With Physics Model Based on the Spectral Nudging and Data Assimilation PDF

Table of Contents

[75] From realâworld patient data to individualized treatment effects using machine learning: current and future methods to address underlying challenges PDF

[53] Dataâdriven physicsâbased digital twins via a library of componentâbased reducedâorder models PDF

[60] Improving Typhoon Predictions by Integrating DataâDriven Machine Learning Model With Physics Model Based on the Spectral Nudging and Data Assimilation PDF