Uncertainty-Aware Diagnostics for Physics-Informed Machine Learning

ICLR 2026 Conference SubmissionAnonymous Authors
physics informedgaussian processmodel selectionuncertainty quantification
Abstract:

Physics-informed machine learning (PIML) integrates prior physical information, often in the form of differential equation constraints, into the process of fitting ML models to physical data. Popular PIML approaches, including neural operators, physics-informed neural networks, and neural ordinary differential equations, are typically fit to objectives that simultaneously include both data and physical constraints. However, the multi-objective nature of this approach creates ambiguity in the measurement of model quality. This is related to a poor understanding of epistemic uncertainty, and it can lead to surprising failure modes, even when existing metrics suggest strong fits. Working within a Gaussian process regression framework, we introduce the Physics-Informed Log Evidence (PILE) score. Bypassing the ambiguities of test losses, the PILE score is a single, uncertainty-aware metric that provides a selection principle for hyperparameters of a physics-informed model. We show that PILE minimization yields excellent choices for a wide variety of model parameters, including kernel bandwidth, least squares regularization weights, and even kernel function selection. We also show that, prior to data acquisition, a special data-free case of the PILE score identifies a-priori kernel choices that are "well adapted" to a given PDE. Beyond the kernel setting, we anticipate that the PILE score can be extended to PIML at large, and we outline approaches to do so.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces the Physics-Informed Log Evidence (PILE) score as a unified metric for hyperparameter selection in Gaussian process-based physics-informed models. It resides in the Theoretical Foundations and Diagnostic Metrics leaf, which contains only two papers total. This sparse population suggests the development of principled selection criteria for physics-informed models remains an underexplored area. The leaf sits within the broader Methodological Frameworks branch, which encompasses Bayesian approaches, ensemble methods, and distance-aware techniques, indicating the work contributes to foundational methodology rather than domain-specific applications.

The taxonomy reveals substantial activity in neighboring methodological categories—Bayesian Physics-Informed Neural Networks contains five papers, Variational and Approximate Inference Methods has two, and Distance-Aware and Evidential Uncertainty Methods includes three. These sibling leaves focus on posterior inference, variational approximations, and calibrated predictions respectively. The PILE score diverges by addressing model selection through marginal likelihood rather than posterior sampling or ensemble aggregation. The scope note for Theoretical Foundations explicitly excludes application-specific validation, positioning this work as a general-purpose diagnostic framework applicable across the diverse domain-specific branches visible in the taxonomy.

Among twenty-eight candidates examined, none clearly refute the three core contributions. The PILE score itself was assessed against ten candidates with zero refutable overlaps; the data-free Fredholm determinant formulation examined eight candidates with no prior work identified; empirical validation against ten candidates likewise found no substantial precedent. This limited search scope—roughly half the taxonomy's fifty papers—suggests the analysis captures top semantic matches but cannot claim exhaustive coverage. The absence of refutable candidates across all contributions indicates either genuine novelty within the examined set or that closely related work lies outside the top-K retrieval window.

The analysis reflects a targeted literature search rather than comprehensive field coverage. The sparse Theoretical Foundations leaf and zero refutable pairs across contributions suggest the PILE score addresses a gap in uncertainty-aware model selection for physics-informed Gaussian processes. However, the twenty-eight-candidate scope leaves open the possibility that relevant prior work exists in adjacent methodological areas or domain-specific applications not captured by semantic similarity. The taxonomy structure indicates active development in related Bayesian and variational methods, which may share conceptual overlap not detected by the current search strategy.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
28
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: uncertainty quantification in physics-informed machine learning. The field organizes around three main branches that reflect complementary perspectives on integrating physical knowledge with data-driven models while rigorously characterizing uncertainty. Methodological Frameworks for Uncertainty Estimation develops foundational techniques—ranging from Bayesian approaches like those in Bayesian Machine Learning[16] and Variational Inference SDEs[44], to ensemble methods and novel diagnostic metrics exemplified by Uncertainty-Aware Diagnostics[0]—that enable practitioners to assess epistemic and aleatoric uncertainties in physics-informed neural networks and related architectures. Domain-Specific Applications of Uncertainty Quantification translates these methods into diverse engineering and scientific contexts, including fluid dynamics (Turbulence Closures Uncertainty[8], Flow Reconstruction[10]), structural health monitoring (Fatigue Life Prediction[6], Bearing Health Prediction[19]), transportation systems (Bayesian Traffic Prediction[2], Car-Following Behaviors[7]), and energy applications (Critical Heat Flux[3], Wind Turbines Power[20]). Cross-Cutting Methodological Advances and Reviews synthesizes insights across domains, offering surveys like Engineering Systems Survey[23] and Bayesian Calibration Survey[45] that distill common challenges and emerging best practices. Recent work highlights tensions between computational efficiency and rigorous uncertainty bounds, with many studies exploring trade-offs between sampling-based Bayesian inference and faster deterministic approximations. Uncertainty-Aware Diagnostics[0] sits within the Theoretical Foundations and Diagnostic Metrics cluster, emphasizing the development of principled evaluation criteria for uncertainty estimates—a concern shared by Label-Free Deep Learning[40], which addresses uncertainty without extensive labeled data. Compared to application-focused neighbors like Critical Heat Flux[3] or Flight Dynamic Uncertainty[4], Uncertainty-Aware Diagnostics[0] prioritizes methodological rigor in validating uncertainty predictions rather than domain-specific deployment. This positioning reflects ongoing debates about whether general-purpose diagnostics can adequately capture the nuances of physical constraints, or whether each application domain requires tailored uncertainty metrics that respect governing equations and boundary conditions.

Claimed Contributions

Physics-Informed Log Evidence (PILE) score

The authors propose the PILE score, a single uncertainty-aware metric derived from the marginal likelihood of a Gaussian process model. This score resolves the multi-objective ambiguity in physics-informed machine learning by providing a principled way to select hyperparameters such as kernel bandwidth, regularization weights, and kernel functions without relying on ambiguous test losses.

10 retrieved papers
Data-free PILE score via Fredholm determinant

The authors introduce a data-free variant of the PILE score that converges to a Fredholm determinant as the number of quadrature points increases. This metric enables a priori kernel selection before any data is collected, identifying kernels that are inherently suited to solving a given partial differential equation.

8 retrieved papers
Empirical validation of PILE for hyperparameter optimization

The authors demonstrate through case studies that minimizing the PILE score yields excellent hyperparameter choices across various settings, including kernel bandwidth selection, regularization weight tuning, and kernel function selection. They show that PILE can diagnose model misspecification and identify optimal kernels, leading to vastly improved performance in challenging scenarios such as the wave equation.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Physics-Informed Log Evidence (PILE) score

The authors propose the PILE score, a single uncertainty-aware metric derived from the marginal likelihood of a Gaussian process model. This score resolves the multi-objective ambiguity in physics-informed machine learning by providing a principled way to select hyperparameters such as kernel bandwidth, regularization weights, and kernel functions without relying on ambiguous test losses.

Contribution

Data-free PILE score via Fredholm determinant

The authors introduce a data-free variant of the PILE score that converges to a Fredholm determinant as the number of quadrature points increases. This metric enables a priori kernel selection before any data is collected, identifying kernels that are inherently suited to solving a given partial differential equation.

Contribution

Empirical validation of PILE for hyperparameter optimization

The authors demonstrate through case studies that minimizing the PILE score yields excellent hyperparameter choices across various settings, including kernel bandwidth selection, regularization weight tuning, and kernel function selection. They show that PILE can diagnose model misspecification and identify optimal kernels, leading to vastly improved performance in challenging scenarios such as the wave equation.