Closed-form $\ell_r$ norm scaling with data for overparameterized linear regression and diagonal linear networks under $\ell_p$ bias

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

Normscaling lawdeep linear nerual networklinear regression

For overparameterized linear regression with isotropic Gaussian design and minimum- $\ell_p$ interpolator $p\in(1,2]$ , we give a unified, high-probability characterization for the scaling of the family of parameter norms $\\{ \lVert \widehat{w_p} \rVert_r \\}_{r \in [1,p]}$ with sample size.

We solve this basic, but unresolved question through a simple dual-ray analysis, which reveals a competition between a signal spike and a bulk of null coordinates in $X^\top Y$ , yielding closed-form predictions for (i) a data-dependent transition $n_\star$ (the "elbow"), and (ii) a universal threshold $r_\star=2(p-1)$ that separates $\lVert \widehat{w_p} \rVert_r$ 's which plateau from those that continue to grow with an explicit exponent.

This unified solution resolves the scaling of all $\ell_r$ norms within the family $r\in [1,p]$ under $\ell_p$ -biased interpolation, and explains in one picture which norms saturate and which increase as $n$ grows.

We then study diagonal linear networks (DLNs) trained by gradient descent. By calibrating the initialization scale $\alpha$ to an effective $p_{\mathrm{eff}}(\alpha)$ via the DLN separable potential, we show empirically that DLNs inherit the same elbow/threshold laws, providing a predictive bridge between explicit and implicit bias.

Given that many generalization proxies depend on $\lVert \widehat {w_p} \rVert_r$ , our results suggest that their predictive power will depend sensitively on which $l_r$ norm is used.

Abstract:

Given that many generalization proxies depend on $\lVert \widehat {w_p} \rVert_r$ , our results suggest that their predictive power will depend sensitively on which $l_r$ norm is used.

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper provides a unified closed-form characterization of parameter norm scaling for minimum-ℓp interpolators in overparameterized linear regression with isotropic Gaussian design. It resides in the 'Closed-Form Characterizations for Isotropic Gaussian Design' leaf, which contains only three papers total (including this one). This represents a sparse, highly specialized research direction within the broader study of explicit bias via minimum-norm interpolation, suggesting the work addresses a focused theoretical gap in understanding how different ℓr norms scale across the family r∈[1,p].

The taxonomy reveals a single main branch ('Explicit Bias via Minimum-Norm Interpolation') with one active leaf, indicating limited diversification in this research area. The scope explicitly excludes implicit bias from optimization dynamics, positioning this work within a purely regularization-theoretic framework. The two sibling papers in the same leaf likely address related norm-scaling questions under similar Gaussian assumptions, but the taxonomy structure suggests neighboring directions (e.g., non-Gaussian designs, implicit bias from gradient descent) remain largely unexplored in the current literature base.

Among 25 candidates examined across three contributions, no refutable prior work was identified. The first contribution (unified scaling laws) examined 8 candidates with zero refutations; the dual-ray analysis examined 7 with none refuting; the diagonal linear network extension examined 10 with none refuting. This suggests that within the limited search scope—focused on top semantic matches and citations—the specific combination of unified ℓr-norm families, spike-bulk competition analysis, and the data-dependent transition n★ appears not to have direct precedent in the examined literature.

The analysis covers a narrow semantic neighborhood (25 papers) rather than an exhaustive survey of overparameterized regression. The absence of refutations reflects the search scope and the paper's technical specificity (e.g., the threshold r★=2(p−1), calibration via DLN separable potential) rather than a definitive claim of field-wide novelty. Broader connections to implicit bias, non-Gaussian settings, or empirical deep learning remain outside this assessment's purview.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Scaling of parameter norms with sample size in overparameterized linear regression. This field examines how the magnitude of learned parameters behaves as training data grows in settings where the number of parameters exceeds the number of samples. The taxonomy centers on a single main branch, Explicit Bias via Minimum-Norm Interpolation, which focuses on characterizing the implicit regularization that arises when fitting overparameterized models by selecting the minimum-norm solution among all interpolating predictors. Within this branch, researchers derive closed-form expressions and asymptotic scaling laws, often under idealized design assumptions such as isotropic Gaussian features, to understand how parameter norms evolve with sample size and to connect these norms to generalization performance. A particularly active line of work within this branch investigates closed-form characterizations for isotropic Gaussian design matrices, where the statistical structure of the data enables precise mathematical analysis. Norm Scaling Overparameterized[0] sits squarely in this cluster, providing explicit scaling results that complement closely related studies such as Norm Scaling Overparameterized[1] and Norm Scaling Overparameterized[2], which also explore norm behavior under similar design conditions. The main themes across these works involve trade-offs between model complexity, sample efficiency, and the stability of minimum-norm solutions, with open questions remaining about how these insights extend to more realistic, non-Gaussian or anisotropic settings. By focusing on tractable Gaussian scenarios, Norm Scaling Overparameterized[0] contributes foundational understanding of parameter scaling that may inform broader theories of implicit bias in overparameterized learning.

Claimed Contributions

Unified closed-form scaling laws for parameter norm families under lp bias

8 retrieved papers

The authors derive the first unified closed-form scaling laws characterizing how the entire family of lr norms scales with sample size for minimum-lp interpolators in overparameterized linear regression. They identify a universal threshold r⋆ = 2(p−1) separating norms that plateau from those that grow, and provide explicit expressions for transition size n⋆ and growth exponents in both spike- and bulk-dominated regimes.

8 retrieved papers

Dual-ray analysis revealing spike-bulk competition

7 retrieved papers

The authors introduce a one-dimensional dual-ray analysis technique that exposes the competition between signal spike and bulk null coordinates in X⊤Y. This analysis yields closed-form predictions for both a data-dependent transition point n⋆ and the universal threshold r⋆ that determines which norms plateau versus continue growing.

7 retrieved papers

Extension to diagonal linear networks via initialization-to-geometry calibration

10 retrieved papers

The authors extend their theoretical framework to diagonal linear networks trained by gradient descent by developing a calibration map from initialization scale α to an effective geometry parameter peff(α). This calibration demonstrates that DLNs exhibit the same elbow and threshold behavior as explicit minimum-lp interpolation, providing a predictive bridge between explicit and implicit bias.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[1] Closed-form âr norm scaling with data for overparameterized linear regression and diagonal linear networks under âp bias PDF

Zhang Shuo-feng, Shuo Zhang, Ard Louis (2025)

[2] Closed-form norm scaling with data for overparameterized linear regression and diagonal linear networks under bias PDF

S Zhang, A Louis (2025)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Unified closed-form scaling laws for parameter norm families under lp bias

[1] Closed-form âr norm scaling with data for overparameterized linear regression and diagonal linear networks under âp bias PDF

Cannot Refute

[2] Closed-form norm scaling with data for overparameterized linear regression and diagonal linear networks under bias PDF

Cannot Refute

[17] Batches Stabilize the Minimum Norm Risk in High-Dimensional Overparametrized Linear Regression PDF

Cannot Refute

[18] Scaling Laws in Linear Regression: Compute, Parameters, and Data PDF

Cannot Refute

[19] Near-interpolators: Rapid norm growth and the trade-off between interpolation and generalization PDF

Cannot Refute

[20] Task Shift: From Classification to Regression in Overparameterized Linear Models PDF

Cannot Refute

[21] Minimum -norm interpolators: Precise asymptotics and multiple descent PDF

Cannot Refute

[22] Robustness of Learning and Control PDF

Cannot Refute

Contribution

Dual-ray analysis revealing spike-bulk competition

[2] Closed-form norm scaling with data for overparameterized linear regression and diagonal linear networks under bias PDF

Cannot Refute

[11] Spike and slab variational Bayes for high dimensional logistic regression PDF

Cannot Refute

[12] Spike-and-slab group lassos for grouped regression and sparse generalized additive models PDF

Cannot Refute

[13] Learning in the presence of low-dimensional structure: a spiked random matrix perspective PDF

Cannot Refute

[14] Spike-and-Slab LASSO Generalized Additive Models and Scalable Algorithms for High-Dimensional Data Analysis PDF

Cannot Refute

[15] Classification, Regression and Dimension Reduction with High-dimensional Data PDF

Cannot Refute

[16] Testing in high-dimensional spiked models PDF

Cannot Refute

Contribution

Extension to diagonal linear networks via initialization-to-geometry calibration

[1] Closed-form âr norm scaling with data for overparameterized linear regression and diagonal linear networks under âp bias PDF

Cannot Refute

[2] Closed-form norm scaling with data for overparameterized linear regression and diagonal linear networks under bias PDF

Cannot Refute

[3] Saddle-to-saddle dynamics in diagonal linear networks PDF

Cannot Refute

[4] Optimization Insights into Deep Diagonal Linear Networks PDF

Cannot Refute

[5] Linear programming using diagonal linear networks PDF

Cannot Refute

[6] Robust Implicit Regularization via Weight Normalization PDF

Cannot Refute

[7] On the spectral bias of two-layer linear networks PDF

Cannot Refute

[8] Implicit Bias in Deep Linear Classification: Initialization Scale vs Training Accuracy PDF

Cannot Refute

[9] Implicit bias of the step size in linear diagonal neural networks PDF

Cannot Refute

[10] The implicit bias of depth: How incremental learning drives generalization PDF

Cannot Refute

Closed-form ℓr\ell_rℓr​ norm scaling with data for overparameterized linear regression and diagonal linear networks under ℓp\ell_pℓp​ bias

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[1] Closed-form âr norm scaling with data for overparameterized linear regression and diagonal linear networks under âp bias PDF

[2] Closed-form norm scaling with data for overparameterized linear regression and diagonal linear networks under bias PDF

Contribution Analysis

Unified closed-form scaling laws for parameter norm families under lp bias

[1] Closed-form âr norm scaling with data for overparameterized linear regression and diagonal linear networks under âp bias PDF

[2] Closed-form norm scaling with data for overparameterized linear regression and diagonal linear networks under bias PDF

[17] Batches Stabilize the Minimum Norm Risk in High-Dimensional Overparametrized Linear Regression PDF

[18] Scaling Laws in Linear Regression: Compute, Parameters, and Data PDF

[19] Near-interpolators: Rapid norm growth and the trade-off between interpolation and generalization PDF

[20] Task Shift: From Classification to Regression in Overparameterized Linear Models PDF

[21] Minimum -norm interpolators: Precise asymptotics and multiple descent PDF

[22] Robustness of Learning and Control PDF

Dual-ray analysis revealing spike-bulk competition

[2] Closed-form norm scaling with data for overparameterized linear regression and diagonal linear networks under bias PDF

[11] Spike and slab variational Bayes for high dimensional logistic regression PDF

[12] Spike-and-slab group lassos for grouped regression and sparse generalized additive models PDF

[13] Learning in the presence of low-dimensional structure: a spiked random matrix perspective PDF

[14] Spike-and-Slab LASSO Generalized Additive Models and Scalable Algorithms for High-Dimensional Data Analysis PDF

[15] Classification, Regression and Dimension Reduction with High-dimensional Data PDF

[16] Testing in high-dimensional spiked models PDF

Extension to diagonal linear networks via initialization-to-geometry calibration

[1] Closed-form âr norm scaling with data for overparameterized linear regression and diagonal linear networks under âp bias PDF

[2] Closed-form norm scaling with data for overparameterized linear regression and diagonal linear networks under bias PDF

[3] Saddle-to-saddle dynamics in diagonal linear networks PDF

[4] Optimization Insights into Deep Diagonal Linear Networks PDF

[5] Linear programming using diagonal linear networks PDF

[6] Robust Implicit Regularization via Weight Normalization PDF

[7] On the spectral bias of two-layer linear networks PDF

[8] Implicit Bias in Deep Linear Classification: Initialization Scale vs Training Accuracy PDF

[9] Implicit bias of the step size in linear diagonal neural networks PDF

[10] The implicit bias of depth: How incremental learning drives generalization PDF

Table of Contents

Closed-form $\ell_r$ norm scaling with data for overparameterized linear regression and diagonal linear networks under $\ell_p$ bias

[1] Closed-form âr norm scaling with data for overparameterized linear regression and diagonal linear networks under âp bias PDF

[1] Closed-form âr norm scaling with data for overparameterized linear regression and diagonal linear networks under âp bias PDF

[1] Closed-form âr norm scaling with data for overparameterized linear regression and diagonal linear networks under âp bias PDF