Overparametrization bends the landscape: BBP transitions at initialization in simple Neural Networks

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.5 Download Report PDF

OverparametrizationLoss landscapesSignal recoveryHigh-dimensional learning

High-dimensional non-convex loss landscapes play a central role in the theory of Machine Learning. Gaining insight into how these landscapes interact with gradient-based optimization methods, even in relatively simple models, can shed light on this enigmatic feature of neural networks. In this work, we will focus on a prototypical simple learning problem, which generalizes the Phase Retrieval inference problem by allowing the exploration of overparametrized settings. Using techniques from field theory, we analyze the spectrum of the Hessian at initialization and identify a Baik–Ben Arous–Péché (BBP) transition in the amount of data that separates regimes where the initialization is informative or uninformative about a planted signal of a teacher-student setup. Crucially, we demonstrate how overparameterization can "bend" the loss landscape, shifting the transition point, even reaching the information-theoretic weak-recovery threshold in the large overparameterization limit, while also altering its qualitative nature. We distinguish between continuous and discontinuous BBP transitions and support our analytical predictions with simulations, examining how they compare to the finite-N behavior. In the case of discontinuous BBP transitions strong finite-N corrections allow the retrieval of information at a signal-to-noise ratio (SNR) smaller than the predicted BBP transition. In these cases we provide estimates for a new lower SNR threshold that marks the point at which initialization becomes entirely uninformative.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper contributes a field-theoretic analysis of Hessian spectra at initialization in overparametrized neural networks, identifying Baik–Ben Arous–Péché (BBP) transitions that separate informative from uninformative initialization regimes. It resides in the 'Phase Transitions and Critical Phenomena at Initialization' leaf, which contains only two papers total. This sparse population suggests the paper addresses a relatively specialized research direction within the broader Hessian analysis landscape, focusing on critical phenomena rather than general spectral characterization or empirical measurement.

The taxonomy tree reveals that the paper's immediate parent branch, 'Theoretical Characterization of Hessian Spectral Properties', contains a sibling leaf on 'Asymptotic Spectral Analysis and Random Matrix Theory' with three papers. Neighboring branches include 'Empirical Analysis of Hessian Structure' (three papers across two leaves) and 'Initialization Schemes and Their Impact' (four papers). The paper's use of field theory and random matrix techniques connects it to asymptotic spectral work, while its focus on overparameterization and information-theoretic thresholds distinguishes it from purely empirical eigenvalue distribution studies or initialization scheme proposals.

Among thirty candidates examined, none were found to clearly refute any of the three main contributions. Contribution A (BBP transitions in overparametrized networks) examined ten candidates with zero refutable matches; Contribution B (continuous versus discontinuous transitions) and Contribution C (weak-recovery threshold via infinite overparametrization) each examined ten candidates with identical outcomes. This suggests that within the limited search scope, the specific combination of BBP transition analysis, overparameterization effects, and information-theoretic threshold characterization appears relatively unexplored, though the absence of refutations does not guarantee exhaustive novelty.

Given the sparse taxonomy leaf (two papers) and zero refutations across thirty candidates, the work appears to occupy a distinct niche within Hessian initialization theory. However, the limited search scope means potentially relevant work in statistical physics, phase retrieval, or teacher-student frameworks outside the top-thirty semantic matches may not have been captured. The analysis reflects novelty within the examined literature but cannot rule out overlooked connections in adjacent fields.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Hessian spectrum analysis at initialization in overparametrized neural networks. The field examines how the eigenvalue distribution of the loss Hessian at initialization shapes subsequent training and generalization. The taxonomy organizes this landscape into several main branches. Theoretical Characterization of Hessian Spectral Properties investigates phase transitions, critical phenomena, and asymptotic spectral laws that emerge as network width or depth grows, often drawing on random matrix theory and statistical mechanics perspectives. Empirical Analysis of Hessian Structure and Eigenvalues focuses on measuring and quantifying eigenvalue distributions, bulk versus outlier structure, and the role of neglected components in real networks. Initialization Schemes and Their Impact studies how different weight-scaling strategies (e.g., Xavier, He, or novel parameterizations) alter the Hessian spectrum and downstream optimization. Training Dynamics and Hessian Evolution tracks how eigenvalues shift during gradient descent, revealing saddle-point structure and time-dependent spectral changes. Finally, Specialized Architectures and Reparameterizations explores how architectural choices (residual connections, normalization layers, or custom parameterizations) modify the Hessian at initialization. A particularly active line of work examines phase transitions and critical regimes where small changes in initialization scale trigger qualitative shifts in the Hessian spectrum, as seen in BBP Transitions Initialization[0] and Goldilocks Zone Initialization[11], which identify narrow windows of initialization variance that balance trainability and feature learning. These studies contrast with broader empirical investigations like Quantifying Hessian Structure[3] and Neglected Hessian Component[8], which document how bulk eigenvalue distributions and often-ignored spectral components influence optimization trajectories. BBP Transitions Initialization[0] sits squarely within the theoretical characterization branch, emphasizing critical phenomena at initialization and connecting spectral properties to subsequent training phases. Its focus on phase boundaries complements works like Asymptotic Hessian Spectrum[13], which derives limiting spectral densities, and Goldilocks Zone Initialization[11], which empirically validates the existence of optimal initialization regimes. Together, these efforts reveal that initialization is not merely a practical detail but a window into the geometry and trainability of overparametrized models.

Claimed Contributions

Analysis of BBP transitions in overparametrized neural networks at initialization

10 retrieved papers

The authors apply field-theoretic techniques to study the Hessian spectrum at initialization in a teacher-student setup with two-layer networks. They characterize the BBP transition that determines when random initialization contains information about the teacher signal, extending this analysis to overparametrized settings beyond standard phase retrieval.

10 retrieved papers

Characterization of continuous versus discontinuous BBP transitions under overparametrization

10 retrieved papers

The authors identify and distinguish two qualitatively different types of BBP transitions (continuous and discontinuous) that arise depending on overparametrization level and loss normalization. They show that higher overparametrization systematically leads to discontinuous transitions with strong finite-size effects.

10 retrieved papers

Demonstration that infinite overparametrization achieves information-theoretic weak-recovery threshold

10 retrieved papers

The authors prove that in the limit of infinite overparametrization, the BBP transition threshold converges to the information-theoretic weak-recovery threshold. This shows that spectral analysis of the Hessian at initialization can match optimal recovery performance through overparametrization alone.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[11] Deconstructing the Goldilocks Zone of Neural Network Initialization PDF

Vysogorets, Artem, Dawid, Anna, Artem Vysogorets, Kempe, Julia, Anna Dawid, Julia Kempe (2024) • International Conference on Machine Learning

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Analysis of BBP transitions in overparametrized neural networks at initialization

[3] Towards quantifying the hessian structure of neural networks PDF

Cannot Refute

[4] Eigenvalues of the Hessian in Deep Learning: Singularity and Beyond PDF

Cannot Refute

[5] Revisiting Initialization of Neural Networks PDF

Cannot Refute

[9] Shallow univariate ReLu networks as splines: initialization, loss surface, hessian, and gradient flow dynamics PDF

Cannot Refute

[11] Deconstructing the Goldilocks Zone of Neural Network Initialization PDF

Cannot Refute

[12] Shallow Univariate ReLu Networks as Splines: Initialization, Loss Surface, Hessian, & Gradient Flow Dynamics PDF

Cannot Refute

[13] The asymptotic spectrum of the hessian of dnn throughout training PDF

Cannot Refute

[17] The Challenges of the Nonlinear Regime for Physics-Informed Neural Networks PDF

Cannot Refute

[18] Vanishing Curvature and the Power of Adaptive Methods in Randomly Initialized Deep Networks PDF

Cannot Refute

[19] Fishing For Cheap And Efficient Pruners At Initialization PDF

Cannot Refute

Contribution

Characterization of continuous versus discontinuous BBP transitions under overparametrization

[25] Overparameterized relu neural networks learn the simplest model: Neural isometry and phase transitions PDF

Cannot Refute

[30] Memorizing without overfitting: Bias, variance, and interpolation in overparameterized models PDF

Cannot Refute

[31] Learning through atypical phase transitions in overparameterized neural networks PDF

Cannot Refute

[32] Theory of overparametrization in quantum neural networks PDF

Cannot Refute

[33] Neural models for prediction of spatially patterned phase transitions: methods and challenges PDF

Cannot Refute

[34] Optimal generalisation and learning transition in extensive-width shallow neural networks near interpolation PDF

Cannot Refute

[35] Hidden progress in deep learning: Sgd learns parities near the computational limit PDF

Cannot Refute

[36] A jamming transition from under-to over-parametrization affects generalization in deep learning PDF

Cannot Refute

[37] Understanding pathologies of deep heteroskedastic regression PDF

Cannot Refute

[38] Bias-variance decomposition of overparameterized regression with random linear features PDF

Cannot Refute

Contribution

Demonstration that infinite overparametrization achieves information-theoretic weak-recovery threshold

[20] A validation approach to over-parameterized matrix and image recovery PDF

Cannot Refute

[21] Flat minima generalize for low-rank matrix recovery PDF

Cannot Refute

[22] Investigating over-parameterized randomized graph networks PDF

Cannot Refute

[23] Can neural networks achieve optimal computational-statistical tradeoff? an analysis on single-index model PDF

Cannot Refute

[24] Information-theoretic reduction of deep neural networks to linear models in the overparametrized proportional regime PDF

Cannot Refute

[25] Overparameterized relu neural networks learn the simplest model: Neural isometry and phase transitions PDF

Cannot Refute

[26] Overparameterization from computational constraints PDF

Cannot Refute

[27] Local Linear Recovery Guarantee of Deep Neural Networks at Overparameterization PDF

Cannot Refute

[28] Bayes-optimal learning of an extensive-width neural network from quadratically many samples PDF

Cannot Refute

[29] Overparameterized relu neural networks learn the simplest models: Neural isometry and exact recovery PDF

Cannot Refute

Overparametrization bends the landscape: BBP transitions at initialization in simple Neural Networks

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[11] Deconstructing the Goldilocks Zone of Neural Network Initialization PDF

Contribution Analysis

Analysis of BBP transitions in overparametrized neural networks at initialization

[3] Towards quantifying the hessian structure of neural networks PDF

[4] Eigenvalues of the Hessian in Deep Learning: Singularity and Beyond PDF

[5] Revisiting Initialization of Neural Networks PDF

[9] Shallow univariate ReLu networks as splines: initialization, loss surface, hessian, and gradient flow dynamics PDF

[11] Deconstructing the Goldilocks Zone of Neural Network Initialization PDF

[12] Shallow Univariate ReLu Networks as Splines: Initialization, Loss Surface, Hessian, & Gradient Flow Dynamics PDF

[13] The asymptotic spectrum of the hessian of dnn throughout training PDF

[17] The Challenges of the Nonlinear Regime for Physics-Informed Neural Networks PDF

[18] Vanishing Curvature and the Power of Adaptive Methods in Randomly Initialized Deep Networks PDF

[19] Fishing For Cheap And Efficient Pruners At Initialization PDF

Characterization of continuous versus discontinuous BBP transitions under overparametrization

[25] Overparameterized relu neural networks learn the simplest model: Neural isometry and phase transitions PDF

[30] Memorizing without overfitting: Bias, variance, and interpolation in overparameterized models PDF

[31] Learning through atypical phase transitions in overparameterized neural networks PDF

[32] Theory of overparametrization in quantum neural networks PDF

[33] Neural models for prediction of spatially patterned phase transitions: methods and challenges PDF

[34] Optimal generalisation and learning transition in extensive-width shallow neural networks near interpolation PDF

[35] Hidden progress in deep learning: Sgd learns parities near the computational limit PDF

[36] A jamming transition from under-to over-parametrization affects generalization in deep learning PDF

[37] Understanding pathologies of deep heteroskedastic regression PDF

[38] Bias-variance decomposition of overparameterized regression with random linear features PDF

Demonstration that infinite overparametrization achieves information-theoretic weak-recovery threshold

[20] A validation approach to over-parameterized matrix and image recovery PDF

[21] Flat minima generalize for low-rank matrix recovery PDF

[22] Investigating over-parameterized randomized graph networks PDF

[23] Can neural networks achieve optimal computational-statistical tradeoff? an analysis on single-index model PDF

[24] Information-theoretic reduction of deep neural networks to linear models in the overparametrized proportional regime PDF

[25] Overparameterized relu neural networks learn the simplest model: Neural isometry and phase transitions PDF

[26] Overparameterization from computational constraints PDF

[27] Local Linear Recovery Guarantee of Deep Neural Networks at Overparameterization PDF

[28] Bayes-optimal learning of an extensive-width neural network from quadratically many samples PDF

[29] Overparameterized relu neural networks learn the simplest models: Neural isometry and exact recovery PDF

Table of Contents