Probabilistic Robustness for Free? Revisiting Training via a Benchmark

ICLR 2026 Conference SubmissionAnonymous Authors
Trustworthy AI; Probabilistic Robustness; Benchmark
Abstract:

Deep learning models are notoriously vulnerable to imperceptible perturbations. Most existing research centers on adversarial robustness (AR), which evaluates models under worst-case scenarios by examining the existence of deterministic adversarial examples (AEs). In contrast, probabilistic robustness (PR) adopts a statistical perspective, measuring the probability that predictions remain correct under stochastic perturbations. While PR is widely regarded as a practical complement to AR, dedicated training methods for improving PR are still relatively underexplored, albeit with emerging progress. Among the few PR-targeted training methods, we identify three limitations: i) non‑comparable evaluation protocols; ii) limited comparisons to strong AT baselines despite anecdotal PR gains from AT, and; iii) no unified framework to compare the generalization of these methods. Thus, we introduce PRBench\mathtt{PRBench}, the first benchmark dedicated to evaluating improvements in PR achieved by different robustness training methods. PRBench\mathtt{PRBench} empirically compares most common AT and PR-targeted training methods using a comprehensive set of metrics, including clean accuracy, PR and AR performance, training efficiency, and generalization error (GE). We also provide theoretical analysis on the GE of PR performance across different training methods. Main findings revealed by PRBench\mathtt{PRBench} include: AT methods are more versatile than PR-targeted training methods in terms of improving both AR and PR performance across diverse hyperparameter settings, while PR-targeted training methods consistently yield lower GE and higher clean accuracy. A leaderboard comprising 222 trained models across 7 datasets and 10 model architectures is publicly available at https://tmpspace.github.io/PRBenchLeaderboard/

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces PRBench, a benchmark for evaluating probabilistic robustness training methods. It resides in the 'Dedicated Probabilistic Robustness Training' leaf, which contains five papers total including this work. This leaf sits within the broader 'Probabilistic Robustness Training Methods' branch, indicating a relatively focused research direction. The taxonomy shows this is a moderately populated area, distinct from the larger 'Adversarial Training Approaches' branch with its multiple subtopics and numerous papers. The benchmark contribution targets a gap in systematic evaluation protocols for methods optimizing probabilistic rather than worst-case robustness metrics.

The taxonomy reveals neighboring work in 'Probabilistic Robustness Verification and Certification' (four papers) and 'Standard Adversarial Training' (five papers), suggesting the field balances empirical training methods with formal verification approaches. The 'Benchmarking and Evaluation Frameworks' leaf contains only two papers, highlighting limited prior work on systematic assessment tools. The paper bridges probabilistic training methods and evaluation frameworks, connecting to adversarial training comparisons while maintaining focus on statistical robustness measures. The taxonomy's scope and exclude notes clarify that this work differs from worst-case adversarial robustness benchmarks by emphasizing stochastic perturbation scenarios.

Among thirty candidates examined, the benchmark contribution (Contribution A) shows no clear refutation across ten candidates, suggesting novelty in comprehensive evaluation protocols. The theoretical generalization framework (Contribution B) encountered two refutable candidates among ten examined, indicating some overlap with existing generalization analysis literature. The risk-based training formulation (Contribution C) found no refutations in ten candidates. These statistics reflect a limited search scope focused on top semantic matches, not exhaustive coverage. The benchmark and formulation contributions appear more distinctive than the theoretical analysis component within this constrained examination.

Based on thirty candidates from semantic search, the work appears to occupy a relatively underexplored niche in probabilistic robustness evaluation. The taxonomy structure confirms sparse prior work in benchmarking frameworks specifically for probabilistic metrics. However, the limited search scope means potential overlaps in broader robustness literature or recent preprints may not be captured. The analysis suggests moderate novelty for the benchmark and formulation, with the theoretical component showing more substantial connections to existing generalization theory.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
30
Contribution Candidate Papers Compared
2
Refutable Paper

Research Landscape Overview

Core task: Benchmarking training methods for improving probabilistic robustness in deep learning. The field encompasses a diverse set of approaches organized into several major branches. Probabilistic Robustness Training Methods focus on techniques that explicitly target probabilistic guarantees and uncertainty quantification, including dedicated methods like Adversarial Probabilistic Training[1] and Intrinsic Probabilistic Robustness[24]. Adversarial Training Approaches emphasize defenses against worst-case perturbations, while Robust Training Under Data Quality Challenges addresses noisy labels and corrupted inputs through methods such as Probabilistic Data Filtering[9] and Feature Purification[10]. Bayesian and Probabilistic Deep Learning explores uncertainty estimation via Bayesian Deep Learning[8] and related frameworks like Uncertainty Baselines[4]. Architectural and Optimization Innovations introduce novel training strategies, including Sinusoidal Robust Training[5] and Lipschitz Bounds Training[30]. Benchmarking and Evaluation Frameworks provide systematic assessments, exemplified by works like Certified Robustness SoK[2] and Robust Deep Learning Competition[16]. Finally, Application-Specific Robustness tailors methods to domains such as medical imaging and graph neural networks. A particularly active line of work centers on certified probabilistic guarantees, where Tight Probabilistic Verification[3] and Certified Probabilistic Robustness[6] explore formal bounds on model behavior under distributional shifts. These contrast with empirical robustness approaches that prioritize practical performance on benchmarks without strict guarantees. The Probabilistic Robustness Benchmark[0] sits squarely within the Dedicated Probabilistic Robustness Training cluster, providing a systematic evaluation framework that complements theoretical works like Tight Probabilistic Verification[3] while offering more comprehensive empirical comparisons than the Probabilistic Robustness Guide[42]. Unlike Adversarial Probabilistic Training[1], which blends adversarial and probabilistic objectives, the benchmark emphasizes rigorous assessment across diverse training methods. Open questions remain around the trade-offs between computational cost, tightness of probabilistic bounds, and generalization to real-world distribution shifts, with ongoing efforts to bridge the gap between certified methods and scalable practical solutions.

Claimed Contributions

PRBench: First Benchmark for Probabilistic Robustness Training Methods

The authors develop PRBench, the first systematic benchmark specifically designed to evaluate training methods for improving probabilistic robustness. It includes 222 trained models across 7 datasets and 10 architectures, evaluating methods using comprehensive metrics covering clean accuracy, PR and AR performance, training efficiency, and generalization error.

10 retrieved papers
Theoretical Generalization Error Analysis Framework

The authors provide a unified theoretical framework based on Uniform Stability Analysis to derive generalization error bounds for different training methods. This includes theorems characterizing the Lipschitz and smoothness properties of adversarial training objectives with and without regularization, explaining why risk-based training methods achieve lower generalization error.

10 retrieved papers
Can Refute
General Formulation of Risk-based Training for Probabilistic Robustness

The authors formalize a general mathematical framework (Definition 3) for risk-based training methods that target probabilistic robustness. This formulation unifies existing PR-targeted training approaches by defining them as minimizing statistical risks over distributional perturbations rather than worst-case adversarial examples.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

PRBench: First Benchmark for Probabilistic Robustness Training Methods

The authors develop PRBench, the first systematic benchmark specifically designed to evaluate training methods for improving probabilistic robustness. It includes 222 trained models across 7 datasets and 10 architectures, evaluating methods using comprehensive metrics covering clean accuracy, PR and AR performance, training efficiency, and generalization error.

Contribution

Theoretical Generalization Error Analysis Framework

The authors provide a unified theoretical framework based on Uniform Stability Analysis to derive generalization error bounds for different training methods. This includes theorems characterizing the Lipschitz and smoothness properties of adversarial training objectives with and without regularization, explaining why risk-based training methods achieve lower generalization error.

Contribution

General Formulation of Risk-based Training for Probabilistic Robustness

The authors formalize a general mathematical framework (Definition 3) for risk-based training methods that target probabilistic robustness. This formulation unifies existing PR-targeted training approaches by defining them as minimizing statistical risks over distributional perturbations rather than worst-case adversarial examples.