Variational Deep Learning via Implicit Regularization

ICLR 2026 Conference SubmissionAnonymous Authors
Implicit RegularizationBayesian Deep LearningGeneralized Variational InferenceImplicit Bias of SGD
Abstract:

Modern deep learning models generalize remarkably well in-distribution, despite being overparametrized and trained with little to no explicit regularization. Instead, current theory credits implicit regularization imposed by the choice of architecture, hyperparameters and optimization procedure. However, deep neural networks can be surprisingly non-robust, resulting in overconfident predictions and poor out-of-distribution generalization. Bayesian deep learning addresses this via model averaging, but typically requires significant computational resources as well as carefully elicited priors to avoid overriding the benefits of implicit regularization. Instead, in this work, we propose to regularize variational neural networks solely by relying on the implicit bias of (stochastic) gradient descent. We theoretically characterize this inductive bias in overparametrized linear models as generalized variational inference and demonstrate the importance of the choice of parametrization. Empirically, our approach demonstrates strong in- and out-of-distribution performance without additional hyperparameter tuning and with minimal computational overhead.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes Implicit Bias Variational Inference (IBVI), which regularizes variational neural networks by relying solely on the implicit bias of stochastic gradient descent rather than explicit priors or hyperparameter tuning. It resides in the 'Implicit Regularization as Variational Inference' leaf, which contains only three papers total (including this one). This leaf sits within the broader 'Theoretical Foundations of Implicit Regularization' branch, indicating the work occupies a relatively sparse research direction focused on establishing formal connections between optimization dynamics and Bayesian inference frameworks.

The taxonomy reveals neighboring leaves addressing related but distinct perspectives: 'Implicit Regularization in Wide and Overparametrized Networks' examines learning dynamics without explicit variational framing, while 'General Theoretical Perspectives on Bayesian Deep Learning' provides broader reviews. The sibling papers in the same leaf share the goal of interpreting gradient descent as variational inference, but the taxonomy's scope notes clarify that this leaf excludes purely empirical applications (which belong in 'Applied Methods') and meta-learning extensions. The paper thus connects to theoretical characterizations of implicit bias while diverging from explicit sampling methods found in the 'Variational Inference and Gradient-Based Sampling Methods' branch.

Among 25 candidates examined across three contributions, the IBVI method shows one refutable candidate out of 10 examined, suggesting some prior work addresses similar algorithmic ideas. The theoretical characterization of implicit bias as generalized variational inference examined 5 candidates with none refutable, indicating this formalization may offer fresh perspective. The extension of maximal update parametrization to probabilistic networks examined 10 candidates with none refutable, suggesting this parametrization choice is relatively unexplored in the Bayesian setting. The limited search scope (25 candidates, not exhaustive) means these assessments reflect top semantic matches rather than comprehensive field coverage.

Based on the top-25 semantic matches examined, the work appears to contribute novel theoretical framing and parametrization insights within a sparse research direction, though the IBVI method itself encounters some prior overlap. The taxonomy structure confirms this area remains less crowded than applied uncertainty quantification branches, which contain more papers. The analysis does not cover broader optimization literature or recent preprints outside the search scope, so the novelty assessment remains provisional pending deeper investigation of gradient-based Bayesian methods.

Taxonomy

Core-task Taxonomy Papers
22
3
Claimed Contributions
25
Contribution Candidate Papers Compared
1
Refutable Paper

Research Landscape Overview

Core task: Bayesian deep learning via implicit regularization of gradient descent. This field explores how standard gradient-based optimization in neural networks can be understood through a Bayesian lens, where the training dynamics themselves induce implicit priors and approximate posterior inference. The taxonomy reveals several complementary perspectives: theoretical foundations examine the mathematical underpinnings of implicit regularization and its connection to variational inference, while variational inference and gradient-based sampling methods develop explicit algorithms that bridge optimization and probabilistic reasoning. Meta-learning branches investigate how gradient-based updates can encode prior knowledge across tasks, and applied methods focus on practical uncertainty quantification for real-world prediction problems. Additional branches address privacy-preserving architectures and model compression, reflecting the need to deploy Bayesian principles in resource-constrained or sensitive settings. Representative works such as Gradient Regularization Inference[7] and Implicitly Bayesian Prediction[10] illustrate how gradient descent can be reinterpreted as performing approximate Bayesian updates, while studies like Loss Landscapes Generalization[3] connect optimization trajectories to generalization behavior. A central tension in this landscape concerns whether implicit regularization alone suffices for reliable uncertainty estimates or whether explicit variational frameworks are necessary. Some lines of work, including Variational Deep Learning[0], argue that viewing gradient descent as variational inference provides a principled foundation for Bayesian deep learning, closely aligning with Gradient Regularization Inference[7] and Implicitly Bayesian Prediction[10] in emphasizing the implicit Bayesian character of standard training. In contrast, methods like Accelerating SVGD[15] and Gradient-bridged Posterior[12] develop explicit sampling or variational schemes to obtain richer posterior approximations. Variational Deep Learning[0] sits within the theoretical branch that interprets optimization dynamics as variational inference, sharing conceptual ground with its immediate neighbors but differing in how it formalizes the connection between gradient flow and posterior approximation. This positioning highlights ongoing debates about whether implicit biases of optimizers can replace or merely complement explicit Bayesian machinery for uncertainty-aware learning.

Claimed Contributions

Implicit Bias Variational Inference (IBVI) method

The authors introduce a method for training variational neural networks by maximizing the expected log-likelihood without explicit KL regularization to the prior. Instead, the method exploits the implicit regularization of SGD to prevent uncertainty collapse and achieve robust generalization.

10 retrieved papers
Can Refute
Theoretical characterization of implicit bias as generalized variational inference

The authors prove that for overparametrized linear models, the implicit bias of SGD when training via the expected loss is equivalent to generalized variational inference with a 2-Wasserstein regularizer penalizing deviations from the prior, extending prior results for non-probabilistic models.

5 retrieved papers
Extension of maximal update parametrization to probabilistic networks

The authors extend the maximal update parametrization (μP) to variational neural networks, enabling hyperparameter transfer from small to large models and ensuring feature learning even as network width increases, which is demonstrated empirically on CIFAR-10.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Implicit Bias Variational Inference (IBVI) method

The authors introduce a method for training variational neural networks by maximizing the expected log-likelihood without explicit KL regularization to the prior. Instead, the method exploits the implicit regularization of SGD to prevent uncertainty collapse and achieve robust generalization.

Contribution

Theoretical characterization of implicit bias as generalized variational inference

The authors prove that for overparametrized linear models, the implicit bias of SGD when training via the expected loss is equivalent to generalized variational inference with a 2-Wasserstein regularizer penalizing deviations from the prior, extending prior results for non-probabilistic models.

Contribution

Extension of maximal update parametrization to probabilistic networks

The authors extend the maximal update parametrization (μP) to variational neural networks, enabling hyperparameter transfer from small to large models and ensuring feature learning even as network width increases, which is demonstrated empirically on CIFAR-10.

Variational Deep Learning via Implicit Regularization | Novelty Validation