Scaling Laws and Spectra of Shallow Neural Networks in the Feature Learning Regime

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 7.0 Download Report PDF

Scaling laws; Neural networks; LASSO and matrix compressed sensing; Random matrix theory; Approximate message passing; High dimensional Statistics

Neural scaling laws underlie many of the recent advances in deep learning, yet their theoretical understanding remains largely confined to linear models. In this work, we present a systematic analysis of scaling laws for quadratic and diagonal neural networks in the feature learning regime. Leveraging connections with matrix compressed sensing and LASSO, we derive a detailed phase diagram for the scaling exponents of the excess risk as a function of sample complexity and weight decay. This analysis uncovers crossovers between distinct scaling regimes and plateau behaviors, mirroring phenomena widely reported in the empirical neural scaling literature. Furthermore, we establish a precise link between these regimes and the spectral properties of the trained network weights, which we characterize in detail. As a consequence, we provide a theoretical validation of recent empirical observations connecting the emergence of power-law tails in the weight spectrum with network generalization performance, yielding an interpretation from first principles.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper derives scaling laws and phase diagrams for quadratic and diagonal neural networks in the feature learning regime, connecting excess risk to sample complexity and weight decay through matrix compressed sensing and LASSO theory. It resides in the 'Scaling Laws and Phase Transitions' leaf, which contains only three papers total, indicating a relatively sparse research direction within the broader taxonomy. The sibling papers in this leaf focus on related but distinct aspects: one examines hidden structure exploitation, another addresses scaling in different network configurations, suggesting this is an emerging area with limited prior theoretical characterization.

The taxonomy reveals that this work sits at the intersection of several active research threads. Its parent branch 'Theoretical Foundations of Feature Learning and Scaling' contains neighboring leaves on infinite-width limits, finite-width dynamics, and kernel-to-feature transitions—all addressing complementary aspects of feature learning theory. The paper's focus on finite-width shallow networks distinguishes it from infinite-width mean-field approaches while connecting to the broader question of when and how networks transition from kernel to feature learning regimes. The taxonomy's 'Empirical Scaling Behavior' branch contains parallel work on transformers and deep networks, highlighting that rigorous theoretical scaling analysis for shallow feature-learning networks occupies a distinct niche.

Among thirty candidates examined, the contribution-level analysis reveals mixed novelty signals. The first contribution (systematic scaling analysis for quadratic/diagonal networks) found zero refutable candidates among ten examined, suggesting this specific architectural focus may be novel within the limited search scope. The second contribution (spectral characterization across phases) similarly showed no clear refutations in ten candidates. However, the third contribution (theoretical validation of spectra-generalization connection) identified one refutable candidate among ten examined, indicating some prior theoretical work exists on linking weight spectra to generalization, though the specific first-principles derivation in this feature-learning context may still offer new insights.

The analysis suggests moderate novelty given the limited search scope of thirty semantically similar papers. The architectural focus on quadratic and diagonal networks in feature learning appears relatively unexplored, while the spectra-generalization connection has some theoretical precedent. The sparse population of the 'Scaling Laws and Phase Transitions' leaf (three papers) and the absence of clear refutations for most contributions indicate this work likely advances the theoretical understanding of shallow network scaling, though a more exhaustive literature search would be needed to definitively assess its novelty relative to the broader compressed sensing and statistical learning theory communities.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: understanding scaling laws for shallow neural networks in the feature learning regime. The field has organized itself around several complementary perspectives. Theoretical Foundations of Feature Learning and Scaling investigates phase transitions, infinite-width limits, and the mathematical underpinnings of how networks learn representations rather than merely interpolating via kernel methods—works such as Feature Learning Infinite Width[1] and Scaling Laws Hidden Structure[4] exemplify this branch. Empirical Scaling Behavior and Architecture Design focuses on practical scaling trends across different architectures, including transformers and alternative designs like Scaling Laws Transformers[3]. Feature Learning Mechanisms and Inductive Biases examines what kinds of features networks prefer to learn, including simplicity biases and the role of initialization, while Learning Algorithms and Training Procedures studies how optimization dynamics—such as large learning rates or greedy layerwise training—affect feature emergence. Application-Specific Architectures and Domain Adaptations and Network Analysis and Interpretability round out the taxonomy by addressing domain-specific constraints and methods for understanding learned representations. A particularly active line of work explores the transition from lazy (kernel) to rich (feature-learning) regimes as network width and training scale vary. Emergence Scaling SGD[28] and Feature Learning Scaling Laws[30] investigate how feature learning emerges with scale, while Simplicity Bias Shallow Networks[7] and Shallow Networks Curse Dimensionality[11] highlight trade-offs between expressiveness and sample complexity in shallow architectures. Scaling Laws Feature Learning[0] sits squarely within this theoretical cluster, focusing on rigorous characterizations of how shallow networks' scaling behavior changes when they actively learn features. Its emphasis on phase transitions and finite-width effects contrasts with the infinite-width perspective of Feature Learning Infinite Width[1] and complements the empirical focus of Feature Learning Scaling Laws[30], offering a bridge between asymptotic theory and practical scaling phenomena in the feature learning regime.

Claimed Contributions

Systematic analysis of scaling laws for quadratic and diagonal neural networks in feature learning regime

10 retrieved papers

The authors provide a comprehensive theoretical characterization of how excess risk scales with sample size and regularization strength for two shallow network architectures (diagonal and quadratic networks) that exhibit genuine feature learning. They derive a complete phase diagram showing distinct scaling regimes and crossovers between them.

10 retrieved papers

Precise characterization of spectral properties of trained network weights across all phases

10 retrieved papers

The authors derive exact formulas for the eigenvalue distribution of learned weights across all training phases. They show that learned weights are noisy, soft-thresholded versions of the target spectrum, with the spectrum consisting of spikes, bulk components, and zero eigenvalues depending on the training regime.

10 retrieved papers

First-principles theoretical validation of spectra-generalization connection

Can Refute

10 retrieved papers

The authors establish a universal error decomposition that directly connects spectral features (bulk, spikes, outliers) to distinct error components (overfitting, underfitting, approximation error). This provides a mathematical foundation for empirical observations that heavy-tailed weight spectra correlate with better generalization.

10 retrieved papers

Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[28] Emergence and scaling laws in SGD learning of shallow neural networks PDF

Ren, Yunwei, Nichani, Eshaan, Wu, Denny, Lee, Jason D. (2025)

[30] How Feature Learning Can Improve Neural Scaling Laws PDF

Blake Bordelon, Alexander Atanasov, Cengiz Pehlevan (2024)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Systematic analysis of scaling laws for quadratic and diagonal neural networks in feature learning regime

[61] Scaling laws for learning with real and surrogate data PDF

Cannot Refute

[62] Learning quadratic neural networks in high dimensions: SGD dynamics and scaling laws PDF

Cannot Refute

[63] Scaling laws are redundancy laws PDF

Cannot Refute

[64] Statistical Physics of Deep Neural Networks: Generalization Capability, Beyond the Infinite Width, and Feature Learning PDF

Cannot Refute

[65] Feature learning and generalization in deep networks with orthogonal weights PDF

Cannot Refute

[66] Scaling laws in linear regression: Compute, parameters, and data PDF

Cannot Refute

[67] Analyzing Neural Scaling Laws in Two-Layer Networks with Power-Law Data Spectra PDF

Cannot Refute

[68] Scaling laws of optimization PDF

Cannot Refute

[69] Deep neural newsvendor PDF

Cannot Refute

[70] Feature learning in infinite-depth neural networks PDF

Cannot Refute

Contribution

Precise characterization of spectral properties of trained network weights across all phases

[51] How powerful are spectral graph neural networks PDF

Cannot Refute

[52] Random matrix theory analysis of neural network weight matrices PDF

Cannot Refute

[53] Hessian eigenvectors and principal component analysis of neural network weight matrices PDF

Cannot Refute

[54] From SGD to Spectra: A Theory of Neural Network Weight Dynamics PDF

Cannot Refute

[55] Spectral evolution and invariance in linear-width neural networks PDF

Cannot Refute

[56] Spectral complexity of deep neural networks PDF

Cannot Refute

[57] Eigenspectrum Analysis of Neural Networks without Aspect Ratio Bias PDF

Cannot Refute

[58] Spectral Adapter: Fine-Tuning in Spectral Space PDF

Cannot Refute

[59] Graph Neural Network-based Spectral Filtering Mechanism for Imbalance Classification in Network Digital Twin PDF

Cannot Refute

[60] Models of Heavy-Tailed Mechanistic Universality PDF

Cannot Refute

Contribution

First-principles theoretical validation of spectra-generalization connection

[55] Spectral evolution and invariance in linear-width neural networks PDF

Can Refute

[71] From Spikes to Heavy Tails: Unveiling the Spectral Evolution of Neural Networks PDF

Cannot Refute

[72] On the power-law spectrum in deep learning: A bridge to protein science PDF

Cannot Refute

[73] Exploring Weight Distributions and Dependence in Neural Networks With -Stable Distributions PDF

Cannot Refute

[74] Emergence of heavy tails in homogenized stochastic gradient descent PDF

Cannot Refute

[75] Compressing Heavy-Tailed Weight Matrices for Non-Vacuous Generalization Bounds PDF

Cannot Refute

[76] Heavy-tailed universality predicts trends in test accuracies for very large pre-trained deep neural networks PDF

Cannot Refute

[77] Bayesian neural network priors revisited PDF

Cannot Refute

[78] Hausdorff dimension, heavy tails, and generalization in neural networks* PDF

Cannot Refute

[79] Heavy-Tailed Regularization of Weight Matrices in Deep Neural Networks PDF

Cannot Refute

Scaling Laws and Spectra of Shallow Neural Networks in the Feature Learning Regime

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[28] Emergence and scaling laws in SGD learning of shallow neural networks PDF

[30] How Feature Learning Can Improve Neural Scaling Laws PDF

Contribution Analysis

Systematic analysis of scaling laws for quadratic and diagonal neural networks in feature learning regime

[61] Scaling laws for learning with real and surrogate data PDF

[62] Learning quadratic neural networks in high dimensions: SGD dynamics and scaling laws PDF

[63] Scaling laws are redundancy laws PDF

[64] Statistical Physics of Deep Neural Networks: Generalization Capability, Beyond the Infinite Width, and Feature Learning PDF

[65] Feature learning and generalization in deep networks with orthogonal weights PDF

[66] Scaling laws in linear regression: Compute, parameters, and data PDF

[67] Analyzing Neural Scaling Laws in Two-Layer Networks with Power-Law Data Spectra PDF

[68] Scaling laws of optimization PDF

[69] Deep neural newsvendor PDF

[70] Feature learning in infinite-depth neural networks PDF

Precise characterization of spectral properties of trained network weights across all phases

[51] How powerful are spectral graph neural networks PDF

[52] Random matrix theory analysis of neural network weight matrices PDF

[53] Hessian eigenvectors and principal component analysis of neural network weight matrices PDF

[54] From SGD to Spectra: A Theory of Neural Network Weight Dynamics PDF

[55] Spectral evolution and invariance in linear-width neural networks PDF

[56] Spectral complexity of deep neural networks PDF

[57] Eigenspectrum Analysis of Neural Networks without Aspect Ratio Bias PDF

[58] Spectral Adapter: Fine-Tuning in Spectral Space PDF

[59] Graph Neural Network-based Spectral Filtering Mechanism for Imbalance Classification in Network Digital Twin PDF

[60] Models of Heavy-Tailed Mechanistic Universality PDF

First-principles theoretical validation of spectra-generalization connection

[55] Spectral evolution and invariance in linear-width neural networks PDF

[71] From Spikes to Heavy Tails: Unveiling the Spectral Evolution of Neural Networks PDF

[72] On the power-law spectrum in deep learning: A bridge to protein science PDF

[73] Exploring Weight Distributions and Dependence in Neural Networks With -Stable Distributions PDF

[74] Emergence of heavy tails in homogenized stochastic gradient descent PDF

[75] Compressing Heavy-Tailed Weight Matrices for Non-Vacuous Generalization Bounds PDF

[76] Heavy-tailed universality predicts trends in test accuracies for very large pre-trained deep neural networks PDF

[77] Bayesian neural network priors revisited PDF

[78] Hausdorff dimension, heavy tails, and generalization in neural networks* PDF

[79] Heavy-Tailed Regularization of Weight Matrices in Deep Neural Networks PDF

Table of Contents