RealBench: A Benchmark for Complex Physical Systems with Real-World Data
Overview
Overall Novelty Assessment
The paper introduces RealPDEBench, a benchmark integrating real-world measurements with paired numerical simulations for scientific machine learning. It resides in the 'Physics-Informed Neural Networks and Uncertainty Quantification' leaf, which contains only three papers total. This leaf sits within the broader 'Physics-Informed and Hybrid Modeling Approaches' branch, indicating a relatively sparse research direction compared to the more crowded robotic control branches (15 papers across three leaves). The focus on benchmark infrastructure for PDE prediction distinguishes it from the sibling papers, which emphasize calibration methods and graph-based physics engines.
The taxonomy reveals that neighboring leaves address hybrid transfer learning with physics priors (1 paper) and model-based reinforcement learning (3 papers), both emphasizing policy learning rather than benchmark construction. The broader field structure shows that most sim-to-real work concentrates on robotic control (15 papers) and digital twin monitoring (13 papers), with physics-informed modeling receiving less attention (5 papers total). RealPDEBench diverges from these directions by targeting scientific ML evaluation infrastructure rather than control policies or industrial monitoring, occupying a niche at the intersection of data-driven learning and physics-based simulation validation.
Among 30 candidates examined, the contribution-level analysis shows varied novelty profiles. The paired real-world and simulated dataset contribution (10 candidates examined, 0 refutable) appears most distinctive, as no prior work provides this specific benchmark infrastructure. The three task categories (10 candidates, 0 refutable) also show no direct overlap. However, the comprehensive evaluation framework (10 candidates, 1 refutable) encounters at least one candidate offering overlapping metrics or evaluation approaches. Given the limited search scope, these statistics suggest the benchmark infrastructure itself is relatively novel, while the evaluation methodology has more substantial prior work within the examined candidates.
Based on the top-30 semantic matches and taxonomy structure, the work addresses a sparse research direction with limited direct competition in its specific leaf. The benchmark contribution appears more novel than the evaluation framework, though the restricted search scope means additional relevant work may exist beyond the candidates examined. The taxonomy context suggests this represents a meaningful but incremental step in a less-explored corner of the broader sim-to-real transfer landscape.
Taxonomy
Research Landscape Overview
Claimed Contributions
The authors present RealPDEBench, the first scientific ML benchmark that systematically pairs real-world experimental measurements with numerical simulations across five complex physical systems. This benchmark includes more than 700 trajectories covering fluid dynamics and combustion scenarios, enabling systematic evaluation of models on real-world data and investigation of the sim-to-real gap.
The authors define three training paradigms: training on simulated data, training on real-world data, and pretraining on simulated data followed by finetuning on real-world data. These tasks enable systematic comparison of the strengths and limitations of both data types and provide a foundation for developing methods that effectively combine them.
The authors introduce a comprehensive evaluation framework consisting of eight metrics that assess model performance from both data-oriented perspectives (such as RMSE and MAE) and physics-oriented perspectives (such as Fourier Space Error and Kinetic Energy Error). They benchmark ten representative baselines including state-of-the-art models and pretrained foundation models using this framework.
Core Task Comparisons
Comparisons with papers in the same taxonomy category
[15] Calibrated Physics-Informed Uncertainty Quantification PDF
[45] Graph networks as learnable physics engines for inference and control PDF
Contribution Analysis
Detailed comparisons for each claimed contribution
RealPDEBench benchmark with paired real-world and simulated data
The authors present RealPDEBench, the first scientific ML benchmark that systematically pairs real-world experimental measurements with numerical simulations across five complex physical systems. This benchmark includes more than 700 trajectories covering fluid dynamics and combustion scenarios, enabling systematic evaluation of models on real-world data and investigation of the sim-to-real gap.
[61] Filtered partial differential equations: a robust surrogate constraint in physics-informed deep learning framework PDF
[62] Computational, Data-Driven, and Physics-Informed Machine Learning Approaches for Microstructure Modeling in Metal Additive Manufacturing PDF
[63] Physics-informed deep-learning applications to experimental fluid mechanics PDF
[64] Bulk Low-Inertia Power Systems Adaptive Fault Type Classification Method Based on Machine Learning and Phasor Measurement Units Data PDF
[65] Predicting fusion ignition at the National Ignition Facility with physics-informed deep learning. PDF
[66] Morpheus: Benchmarking Physical Reasoning of Video Generative Models with Real Physical Experiments PDF
[67] On the prediction of critical heat flux using a physics-informed machine learning-aided framework PDF
[68] Real-time Fusion of Multi-Source Monitoring Data with Geotechnical Numerical Model Results using Data-driven and Physics-informed Sparse Dictionary Learning PDF
[69] Scientific Machine Learning (SciML) - How the Fusion of AI and Physics is Giving Rise to Promising Simulation Methodologies PDF
[70] Evaluating Universal Machine Learning Force Fields Against Experimental Measurements PDF
Three task categories for comparing real-world and simulated data
The authors define three training paradigms: training on simulated data, training on real-world data, and pretraining on simulated data followed by finetuning on real-world data. These tasks enable systematic comparison of the strengths and limitations of both data types and provide a foundation for developing methods that effectively combine them.
[71] Interpretable machine learning for science with PySR and SymbolicRegression. jl PDF
[72] Combining machine learning and simulation to a hybrid modelling approach: Current and future directions PDF
[73] Domain-adaptive neural networks improve supervised machine learning based on simulated population genetic data PDF
[74] Next-generation deep learning based on simulators and synthetic data PDF
[75] From realâworld patient data to individualized treatment effects using machine learning: current and future methods to address underlying challenges PDF
[76] MLReal: Bridging the gap between training on synthetic data and real data applications in machine learning PDF
[77] Challenges of real-world reinforcement learning: definitions, benchmarks and analysis PDF
[78] Merging physics-based synthetic data and machine learning for thermal monitoring of lithium-ion batteries: the role of data fidelity PDF
[79] Physics informed synthetic image generation for deep learning-based detection of wrinkles and folds PDF
[80] Transfer-learning: Bridging the gap between real and simulation data for machine learning in injection molding PDF
Comprehensive evaluation framework with data-oriented and physics-oriented metrics
The authors introduce a comprehensive evaluation framework consisting of eight metrics that assess model performance from both data-oriented perspectives (such as RMSE and MAE) and physics-oriented perspectives (such as Fourier Space Error and Kinetic Energy Error). They benchmark ten representative baselines including state-of-the-art models and pretrained foundation models using this framework.