Training-Free Determination of Network Width via Neural Tangent Kernel

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 5.5 Download Report PDF

neural tangent kernelkernel regressionsmallest eigenvaluegeneralization error

Determining an appropriate size for an artificial neural network under computational constraints is a fundamental challenge. This paper introduces a practical metric, derived from Neural Tangent Kernel (NTK), for estimating the minimum necessary network width with respect to test loss -- prior to training. We provide both theoretical and empirical evidence that the smallest eigenvalue of the NTK strongly influences test loss in wide but finite-width neural networks. Based on this observation, we define an NTK-based metric computed at initialization to identify what we call cardinal width, i.e., the width of a network at which generalization performance saturates. Our experiments across multiple datasets and architectures demonstrate the effectiveness of this metric in estimating the cardinal width.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes a training-free method to determine optimal neural network width by analyzing the smallest eigenvalue of the Neural Tangent Kernel at initialization. It resides in the 'Training-Free Width and Sparsity Determination' leaf, which contains only three papers total, indicating a relatively sparse research direction within the broader taxonomy. This leaf sits under 'Training-Free Architecture Selection and Neural Architecture Search', distinguishing it from purely theoretical NTK studies and general NAS frameworks that rank diverse architectures rather than specifically sizing network width.

The taxonomy reveals neighboring work in 'Training-Free NAS with NTK Metrics' (five papers on architecture ranking) and extensive theoretical branches analyzing NTK eigenvalue bounds and convergence properties. The paper bridges practical architecture selection and theoretical eigenvalue analysis, drawing on insights from 'Finite-Width NTK Eigenvalue Bounds' while targeting a different application domain. Related work on sparsity determination (e.g., Phew Sparse Networks) addresses connectivity patterns rather than width, and general NAS methods optimize over broader search spaces without focusing on the width-saturation phenomenon this paper targets.

Among 25 candidates examined, all three contributions show some prior work overlap. The infinite-width theory linking test error to smallest eigenvalue examined 10 candidates with 3 refutable matches; the finite-width extension examined 10 candidates with 2 refutable matches; the training-free width selection method examined 5 candidates with 1 refutable match. These statistics suggest that while the specific combination of contributions may be novel, individual theoretical components and the general idea of using NTK eigenvalues for architecture decisions have precedent in the limited literature sample reviewed.

Based on the top-25 semantic matches examined, the work appears to synthesize existing theoretical insights into a practical width-determination tool, occupying a sparsely populated niche within training-free architecture selection. The analysis does not cover the full breadth of NTK literature or exhaustive architecture search methods, so additional relevant prior work may exist beyond this search scope.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: training-free neural network width determination using neural tangent kernel eigenvalues. The field structure reflects a convergence of theoretical insights and practical architecture design. At the highest level, one branch focuses on Training-Free Architecture Selection and Neural Architecture Search, encompassing methods that bypass expensive training cycles by leveraging proxy metrics such as NTK properties or gradient statistics (e.g., NAS ImageNet Four Hours[1], Ultra-fast NAS NTK[23]). A second branch, NTK Eigenvalue Theory and Convergence Analysis, delves into the spectral properties of neural tangent kernels, examining how eigenvalue distributions govern trainability and convergence (e.g., Deformed Semicircle Law[5], Smallest Eigenvalue Bounds[19]). A third branch, NTK-Based Generalization and Optimization Analysis, connects kernel spectra to generalization guarantees and optimization dynamics (e.g., NTK Eigenvalues Generalization[13], Wide Networks Generalization[6]). Finally, Theoretical Frameworks for Wide Network Learning explores the infinite-width regime and its implications for learning theory (e.g., Infinite-Width NTKs[2], Infinite Attention NNGP[3]). Within this landscape, particularly active lines of work explore how spectral measures can guide architecture choices without full training. Training-Free Network Width[0] sits naturally within the Training-Free Architecture Selection branch, specifically addressing width and sparsity determination by analyzing NTK eigenvalues to predict optimal network capacity. This approach contrasts with nearby efforts such as Phew Sparse Networks[14], which targets sparsity patterns, and Graphon Limit Pruning[17], which employs graphon theory for pruning decisions. While many studies in the NTK Eigenvalue Theory branch focus on asymptotic convergence guarantees or spectral laws in the infinite-width limit, Training-Free Network Width[0] emphasizes a practical, finite-width setting where eigenvalue diagnostics directly inform architectural hyperparameters. This positioning highlights an ongoing tension between rigorous theoretical characterizations of wide networks and the need for computationally efficient, training-free heuristics that practitioners can deploy at scale.

Claimed Contributions

Infinite-width theory linking test error to smallest NTK eigenvalue

Can Refute

10 retrieved papers

The authors establish a theoretical upper bound on test error for infinite-width neural networks that is proportional to the inverse square of the smallest eigenvalue of the Neural Tangent Kernel, analyzed through kernel ridgeless regression.

10 retrieved papers

Can Refute

Finite-width theory extending smallest eigenvalue bound to practical networks

Can Refute

10 retrieved papers

The authors extend their theoretical analysis to finite-width networks, demonstrating that the test error upper bound remains controlled by the inverse square of the smallest eigenvalue of the empirical NTK at initialization.

10 retrieved papers

Can Refute

Training-free width selection method based on NTK eigenvalue saturation

Can Refute

5 retrieved papers

The authors introduce a practical algorithm that identifies the cardinal width (where generalization performance saturates) by monitoring when the smallest eigenvalue of the NTK at initialization stops growing, enabling width determination without training.

5 retrieved papers

Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[14] Phew: Constructing sparse networks that learn fast and generalize well without training data PDF

Patil, Shreyas Malakarjun, Shreyas Malakarjun Patil, Dovrolis, Constantine, Constantine Dovrolis (2021)

[17] The Graphon Limit Hypothesis: Understanding Neural Network Pruning via Infinite Width Analysis PDF

Pham Hoang, Ta, The-Anh, Hoang Pham, Jacobs, Tom, T. Ta, Burkholz, Rebekka, Tom Jacobs, Tran-Thanh, Long, Rebekka Burkholz, Long Tran-Thanh (2025)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Infinite-width theory linking test error to smallest NTK eigenvalue

[15] Generalization Properties of NAS under Activation and Skip Connection Search PDF

Can Refute

[19] Tight Bounds on the Smallest Eigenvalue of the Neural Tangent Kernel for Deep ReLU Networks PDF

Can Refute

[35] Disentangling trainability and generalization in deep neural networks PDF

Can Refute

[5] Deformed semicircle law and concentration of nonlinear random matrices for ultra-wide neural networks PDF

Cannot Refute

[18] Memorization and Optimization in Deep Neural Networks with Minimum Over-parameterization PDF

Cannot Refute

[32] Mathematical Foundations of Neural Tangents and Infinite-Width Networks PDF

Cannot Refute

[33] Spectrum dependent learning curves in kernel regression and wide neural networks PDF

Cannot Refute

[34] How Learnable Grids Recover Fine Detail in Low Dimensions: A Neural Tangent Kernel Analysis of Multigrid Parametric Encodings PDF

Cannot Refute

[36] How does a kernel based on gradients of infinite-width neural networks come to be widely used: a review of the neural tangent kernel PDF

Cannot Refute

[37] Benign overfitting in deep neural networks under lazy training PDF

Cannot Refute

Contribution

Finite-width theory extending smallest eigenvalue bound to practical networks

[15] Generalization Properties of NAS under Activation and Skip Connection Search PDF

Can Refute

[38] The interpolation phase transition in neural networks: Memorization and generalization under lazy training PDF

Can Refute

[18] Memorization and Optimization in Deep Neural Networks with Minimum Over-parameterization PDF

Cannot Refute

[36] How does a kernel based on gradients of infinite-width neural networks come to be widely used: a review of the neural tangent kernel PDF

Cannot Refute

[39] -ReQ : Assessing Representation Quality in Self-Supervised Learning by measuring eigenspectrum decay PDF

Cannot Refute

[40] Non-Asymptotic Optimization and Generalization Bounds for Stochastic Gauss-Newton in Overparameterized Models PDF

Cannot Refute

[41] The Three Paradigms of Physics-Informed Learning: Neural Networks (PINNs), Neural Operators (PINOs), and Reinforcement Learning (PIRL) PDF

Cannot Refute

[42] Distributed PCA-based anomaly detection in wireless sensor networks PDF

Cannot Refute

[43] On a Mathematical Understanding of Deep Neural Networks PDF

Cannot Refute

[44] Focusing of pulsed neutrons by traveling magnetic potentials PDF

Cannot Refute

Contribution

Training-free width selection method based on NTK eigenvalue saturation

[15] Generalization Properties of NAS under Activation and Skip Connection Search PDF

Can Refute

[2] The Surprising Effectiveness of Infinite-Width NTKs for Characterizing and Improving Model Training PDF

Cannot Refute

[29] How many neurons do we need? A refined analysis for shallow networks trained with gradient descent PDF

Cannot Refute

[30] What Can the Neural Tangent Kernel Tell Us About Adversarial Robustness? PDF

Cannot Refute

[31] Learning Lipschitz Functions by GD-trained Shallow Overparameterized ReLU Neural Networks PDF

Cannot Refute

Training-Free Determination of Network Width via Neural Tangent Kernel

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[14] Phew: Constructing sparse networks that learn fast and generalize well without training data PDF

[17] The Graphon Limit Hypothesis: Understanding Neural Network Pruning via Infinite Width Analysis PDF

Contribution Analysis

Infinite-width theory linking test error to smallest NTK eigenvalue

[15] Generalization Properties of NAS under Activation and Skip Connection Search PDF

[19] Tight Bounds on the Smallest Eigenvalue of the Neural Tangent Kernel for Deep ReLU Networks PDF

[35] Disentangling trainability and generalization in deep neural networks PDF

[5] Deformed semicircle law and concentration of nonlinear random matrices for ultra-wide neural networks PDF

[18] Memorization and Optimization in Deep Neural Networks with Minimum Over-parameterization PDF

[32] Mathematical Foundations of Neural Tangents and Infinite-Width Networks PDF

[33] Spectrum dependent learning curves in kernel regression and wide neural networks PDF

[34] How Learnable Grids Recover Fine Detail in Low Dimensions: A Neural Tangent Kernel Analysis of Multigrid Parametric Encodings PDF

[36] How does a kernel based on gradients of infinite-width neural networks come to be widely used: a review of the neural tangent kernel PDF

[37] Benign overfitting in deep neural networks under lazy training PDF

Finite-width theory extending smallest eigenvalue bound to practical networks

[15] Generalization Properties of NAS under Activation and Skip Connection Search PDF

[38] The interpolation phase transition in neural networks: Memorization and generalization under lazy training PDF

[18] Memorization and Optimization in Deep Neural Networks with Minimum Over-parameterization PDF

[36] How does a kernel based on gradients of infinite-width neural networks come to be widely used: a review of the neural tangent kernel PDF

[39] -ReQ : Assessing Representation Quality in Self-Supervised Learning by measuring eigenspectrum decay PDF

[40] Non-Asymptotic Optimization and Generalization Bounds for Stochastic Gauss-Newton in Overparameterized Models PDF

[41] The Three Paradigms of Physics-Informed Learning: Neural Networks (PINNs), Neural Operators (PINOs), and Reinforcement Learning (PIRL) PDF

[42] Distributed PCA-based anomaly detection in wireless sensor networks PDF

[43] On a Mathematical Understanding of Deep Neural Networks PDF

[44] Focusing of pulsed neutrons by traveling magnetic potentials PDF

Training-free width selection method based on NTK eigenvalue saturation

[15] Generalization Properties of NAS under Activation and Skip Connection Search PDF

[2] The Surprising Effectiveness of Infinite-Width NTKs for Characterizing and Improving Model Training PDF

[29] How many neurons do we need? A refined analysis for shallow networks trained with gradient descent PDF

[30] What Can the Neural Tangent Kernel Tell Us About Adversarial Robustness? PDF

[31] Learning Lipschitz Functions by GD-trained Shallow Overparameterized ReLU Neural Networks PDF

Table of Contents