Scaling Laws and Spectra of Shallow Neural Networks in the Feature Learning Regime
Overview
Overall Novelty Assessment
The paper derives scaling laws and phase diagrams for quadratic and diagonal neural networks in the feature learning regime, connecting excess risk to sample complexity and weight decay through matrix compressed sensing and LASSO theory. It resides in the 'Scaling Laws and Phase Transitions' leaf, which contains only three papers total, indicating a relatively sparse research direction within the broader taxonomy. The sibling papers in this leaf focus on related but distinct aspects: one examines hidden structure exploitation, another addresses scaling in different network configurations, suggesting this is an emerging area with limited prior theoretical characterization.
The taxonomy reveals that this work sits at the intersection of several active research threads. Its parent branch 'Theoretical Foundations of Feature Learning and Scaling' contains neighboring leaves on infinite-width limits, finite-width dynamics, and kernel-to-feature transitions—all addressing complementary aspects of feature learning theory. The paper's focus on finite-width shallow networks distinguishes it from infinite-width mean-field approaches while connecting to the broader question of when and how networks transition from kernel to feature learning regimes. The taxonomy's 'Empirical Scaling Behavior' branch contains parallel work on transformers and deep networks, highlighting that rigorous theoretical scaling analysis for shallow feature-learning networks occupies a distinct niche.
Among thirty candidates examined, the contribution-level analysis reveals mixed novelty signals. The first contribution (systematic scaling analysis for quadratic/diagonal networks) found zero refutable candidates among ten examined, suggesting this specific architectural focus may be novel within the limited search scope. The second contribution (spectral characterization across phases) similarly showed no clear refutations in ten candidates. However, the third contribution (theoretical validation of spectra-generalization connection) identified one refutable candidate among ten examined, indicating some prior theoretical work exists on linking weight spectra to generalization, though the specific first-principles derivation in this feature-learning context may still offer new insights.
The analysis suggests moderate novelty given the limited search scope of thirty semantically similar papers. The architectural focus on quadratic and diagonal networks in feature learning appears relatively unexplored, while the spectra-generalization connection has some theoretical precedent. The sparse population of the 'Scaling Laws and Phase Transitions' leaf (three papers) and the absence of clear refutations for most contributions indicate this work likely advances the theoretical understanding of shallow network scaling, though a more exhaustive literature search would be needed to definitively assess its novelty relative to the broader compressed sensing and statistical learning theory communities.
Taxonomy
Research Landscape Overview
Claimed Contributions
The authors provide a comprehensive theoretical characterization of how excess risk scales with sample size and regularization strength for two shallow network architectures (diagonal and quadratic networks) that exhibit genuine feature learning. They derive a complete phase diagram showing distinct scaling regimes and crossovers between them.
The authors derive exact formulas for the eigenvalue distribution of learned weights across all training phases. They show that learned weights are noisy, soft-thresholded versions of the target spectrum, with the spectrum consisting of spikes, bulk components, and zero eigenvalues depending on the training regime.
The authors establish a universal error decomposition that directly connects spectral features (bulk, spikes, outliers) to distinct error components (overfitting, underfitting, approximation error). This provides a mathematical foundation for empirical observations that heavy-tailed weight spectra correlate with better generalization.
Core Task Comparisons
Comparisons with papers in the same taxonomy category
Contribution Analysis
Detailed comparisons for each claimed contribution
Systematic analysis of scaling laws for quadratic and diagonal neural networks in feature learning regime
The authors provide a comprehensive theoretical characterization of how excess risk scales with sample size and regularization strength for two shallow network architectures (diagonal and quadratic networks) that exhibit genuine feature learning. They derive a complete phase diagram showing distinct scaling regimes and crossovers between them.
[61] Scaling laws for learning with real and surrogate data PDF
[62] Learning quadratic neural networks in high dimensions: SGD dynamics and scaling laws PDF
[63] Scaling laws are redundancy laws PDF
[64] Statistical Physics of Deep Neural Networks: Generalization Capability, Beyond the Infinite Width, and Feature Learning PDF
[65] Feature learning and generalization in deep networks with orthogonal weights PDF
[66] Scaling laws in linear regression: Compute, parameters, and data PDF
[67] Analyzing Neural Scaling Laws in Two-Layer Networks with Power-Law Data Spectra PDF
[68] Scaling laws of optimization PDF
[69] Deep neural newsvendor PDF
[70] Feature learning in infinite-depth neural networks PDF
Precise characterization of spectral properties of trained network weights across all phases
The authors derive exact formulas for the eigenvalue distribution of learned weights across all training phases. They show that learned weights are noisy, soft-thresholded versions of the target spectrum, with the spectrum consisting of spikes, bulk components, and zero eigenvalues depending on the training regime.
[51] How powerful are spectral graph neural networks PDF
[52] Random matrix theory analysis of neural network weight matrices PDF
[53] Hessian eigenvectors and principal component analysis of neural network weight matrices PDF
[54] From SGD to Spectra: A Theory of Neural Network Weight Dynamics PDF
[55] Spectral evolution and invariance in linear-width neural networks PDF
[56] Spectral complexity of deep neural networks PDF
[57] Eigenspectrum Analysis of Neural Networks without Aspect Ratio Bias PDF
[58] Spectral Adapter: Fine-Tuning in Spectral Space PDF
[59] Graph Neural Network-based Spectral Filtering Mechanism for Imbalance Classification in Network Digital Twin PDF
[60] Models of Heavy-Tailed Mechanistic Universality PDF
First-principles theoretical validation of spectra-generalization connection
The authors establish a universal error decomposition that directly connects spectral features (bulk, spikes, outliers) to distinct error components (overfitting, underfitting, approximation error). This provides a mathematical foundation for empirical observations that heavy-tailed weight spectra correlate with better generalization.