Overparametrization bends the landscape: BBP transitions at initialization in simple Neural Networks

ICLR 2026 Conference SubmissionAnonymous Authors
OverparametrizationLoss landscapesSignal recoveryHigh-dimensional learning
Abstract:

High-dimensional non-convex loss landscapes play a central role in the theory of Machine Learning. Gaining insight into how these landscapes interact with gradient-based optimization methods, even in relatively simple models, can shed light on this enigmatic feature of neural networks. In this work, we will focus on a prototypical simple learning problem, which generalizes the Phase Retrieval inference problem by allowing the exploration of overparametrized settings. Using techniques from field theory, we analyze the spectrum of the Hessian at initialization and identify a Baik–Ben Arous–Péché (BBP) transition in the amount of data that separates regimes where the initialization is informative or uninformative about a planted signal of a teacher-student setup. Crucially, we demonstrate how overparameterization can "bend" the loss landscape, shifting the transition point, even reaching the information-theoretic weak-recovery threshold in the large overparameterization limit, while also altering its qualitative nature. We distinguish between continuous and discontinuous BBP transitions and support our analytical predictions with simulations, examining how they compare to the finite-N behavior. In the case of discontinuous BBP transitions strong finite-N corrections allow the retrieval of information at a signal-to-noise ratio (SNR) smaller than the predicted BBP transition. In these cases we provide estimates for a new lower SNR threshold that marks the point at which initialization becomes entirely uninformative.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper contributes a field-theoretic analysis of Hessian spectra at initialization in overparametrized neural networks, identifying Baik–Ben Arous–Péché (BBP) transitions that separate informative from uninformative initialization regimes. It resides in the 'Phase Transitions and Critical Phenomena at Initialization' leaf, which contains only two papers total. This sparse population suggests the paper addresses a relatively specialized research direction within the broader Hessian analysis landscape, focusing on critical phenomena rather than general spectral characterization or empirical measurement.

The taxonomy tree reveals that the paper's immediate parent branch, 'Theoretical Characterization of Hessian Spectral Properties', contains a sibling leaf on 'Asymptotic Spectral Analysis and Random Matrix Theory' with three papers. Neighboring branches include 'Empirical Analysis of Hessian Structure' (three papers across two leaves) and 'Initialization Schemes and Their Impact' (four papers). The paper's use of field theory and random matrix techniques connects it to asymptotic spectral work, while its focus on overparameterization and information-theoretic thresholds distinguishes it from purely empirical eigenvalue distribution studies or initialization scheme proposals.

Among thirty candidates examined, none were found to clearly refute any of the three main contributions. Contribution A (BBP transitions in overparametrized networks) examined ten candidates with zero refutable matches; Contribution B (continuous versus discontinuous transitions) and Contribution C (weak-recovery threshold via infinite overparametrization) each examined ten candidates with identical outcomes. This suggests that within the limited search scope, the specific combination of BBP transition analysis, overparameterization effects, and information-theoretic threshold characterization appears relatively unexplored, though the absence of refutations does not guarantee exhaustive novelty.

Given the sparse taxonomy leaf (two papers) and zero refutations across thirty candidates, the work appears to occupy a distinct niche within Hessian initialization theory. However, the limited search scope means potentially relevant work in statistical physics, phase retrieval, or teacher-student frameworks outside the top-thirty semantic matches may not have been captured. The analysis reflects novelty within the examined literature but cannot rule out overlooked connections in adjacent fields.

Taxonomy

Core-task Taxonomy Papers
16
3
Claimed Contributions
30
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: Hessian spectrum analysis at initialization in overparametrized neural networks. The field examines how the eigenvalue distribution of the loss Hessian at initialization shapes subsequent training and generalization. The taxonomy organizes this landscape into several main branches. Theoretical Characterization of Hessian Spectral Properties investigates phase transitions, critical phenomena, and asymptotic spectral laws that emerge as network width or depth grows, often drawing on random matrix theory and statistical mechanics perspectives. Empirical Analysis of Hessian Structure and Eigenvalues focuses on measuring and quantifying eigenvalue distributions, bulk versus outlier structure, and the role of neglected components in real networks. Initialization Schemes and Their Impact studies how different weight-scaling strategies (e.g., Xavier, He, or novel parameterizations) alter the Hessian spectrum and downstream optimization. Training Dynamics and Hessian Evolution tracks how eigenvalues shift during gradient descent, revealing saddle-point structure and time-dependent spectral changes. Finally, Specialized Architectures and Reparameterizations explores how architectural choices (residual connections, normalization layers, or custom parameterizations) modify the Hessian at initialization. A particularly active line of work examines phase transitions and critical regimes where small changes in initialization scale trigger qualitative shifts in the Hessian spectrum, as seen in BBP Transitions Initialization[0] and Goldilocks Zone Initialization[11], which identify narrow windows of initialization variance that balance trainability and feature learning. These studies contrast with broader empirical investigations like Quantifying Hessian Structure[3] and Neglected Hessian Component[8], which document how bulk eigenvalue distributions and often-ignored spectral components influence optimization trajectories. BBP Transitions Initialization[0] sits squarely within the theoretical characterization branch, emphasizing critical phenomena at initialization and connecting spectral properties to subsequent training phases. Its focus on phase boundaries complements works like Asymptotic Hessian Spectrum[13], which derives limiting spectral densities, and Goldilocks Zone Initialization[11], which empirically validates the existence of optimal initialization regimes. Together, these efforts reveal that initialization is not merely a practical detail but a window into the geometry and trainability of overparametrized models.

Claimed Contributions

Analysis of BBP transitions in overparametrized neural networks at initialization

The authors apply field-theoretic techniques to study the Hessian spectrum at initialization in a teacher-student setup with two-layer networks. They characterize the BBP transition that determines when random initialization contains information about the teacher signal, extending this analysis to overparametrized settings beyond standard phase retrieval.

10 retrieved papers
Characterization of continuous versus discontinuous BBP transitions under overparametrization

The authors identify and distinguish two qualitatively different types of BBP transitions (continuous and discontinuous) that arise depending on overparametrization level and loss normalization. They show that higher overparametrization systematically leads to discontinuous transitions with strong finite-size effects.

10 retrieved papers
Demonstration that infinite overparametrization achieves information-theoretic weak-recovery threshold

The authors prove that in the limit of infinite overparametrization, the BBP transition threshold converges to the information-theoretic weak-recovery threshold. This shows that spectral analysis of the Hessian at initialization can match optimal recovery performance through overparametrization alone.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Analysis of BBP transitions in overparametrized neural networks at initialization

The authors apply field-theoretic techniques to study the Hessian spectrum at initialization in a teacher-student setup with two-layer networks. They characterize the BBP transition that determines when random initialization contains information about the teacher signal, extending this analysis to overparametrized settings beyond standard phase retrieval.

Contribution

Characterization of continuous versus discontinuous BBP transitions under overparametrization

The authors identify and distinguish two qualitatively different types of BBP transitions (continuous and discontinuous) that arise depending on overparametrization level and loss normalization. They show that higher overparametrization systematically leads to discontinuous transitions with strong finite-size effects.

Contribution

Demonstration that infinite overparametrization achieves information-theoretic weak-recovery threshold

The authors prove that in the limit of infinite overparametrization, the BBP transition threshold converges to the information-theoretic weak-recovery threshold. This shows that spectral analysis of the Hessian at initialization can match optimal recovery performance through overparametrization alone.