Training-Free Determination of Network Width via Neural Tangent Kernel

ICLR 2026 Conference SubmissionAnonymous Authors
neural tangent kernelkernel regressionsmallest eigenvaluegeneralization error
Abstract:

Determining an appropriate size for an artificial neural network under computational constraints is a fundamental challenge. This paper introduces a practical metric, derived from Neural Tangent Kernel (NTK), for estimating the minimum necessary network width with respect to test loss -- prior to training. We provide both theoretical and empirical evidence that the smallest eigenvalue of the NTK strongly influences test loss in wide but finite-width neural networks. Based on this observation, we define an NTK-based metric computed at initialization to identify what we call cardinal width, i.e., the width of a network at which generalization performance saturates. Our experiments across multiple datasets and architectures demonstrate the effectiveness of this metric in estimating the cardinal width.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes a training-free method to determine optimal neural network width by analyzing the smallest eigenvalue of the Neural Tangent Kernel at initialization. It resides in the 'Training-Free Width and Sparsity Determination' leaf, which contains only three papers total, indicating a relatively sparse research direction within the broader taxonomy. This leaf sits under 'Training-Free Architecture Selection and Neural Architecture Search', distinguishing it from purely theoretical NTK studies and general NAS frameworks that rank diverse architectures rather than specifically sizing network width.

The taxonomy reveals neighboring work in 'Training-Free NAS with NTK Metrics' (five papers on architecture ranking) and extensive theoretical branches analyzing NTK eigenvalue bounds and convergence properties. The paper bridges practical architecture selection and theoretical eigenvalue analysis, drawing on insights from 'Finite-Width NTK Eigenvalue Bounds' while targeting a different application domain. Related work on sparsity determination (e.g., Phew Sparse Networks) addresses connectivity patterns rather than width, and general NAS methods optimize over broader search spaces without focusing on the width-saturation phenomenon this paper targets.

Among 25 candidates examined, all three contributions show some prior work overlap. The infinite-width theory linking test error to smallest eigenvalue examined 10 candidates with 3 refutable matches; the finite-width extension examined 10 candidates with 2 refutable matches; the training-free width selection method examined 5 candidates with 1 refutable match. These statistics suggest that while the specific combination of contributions may be novel, individual theoretical components and the general idea of using NTK eigenvalues for architecture decisions have precedent in the limited literature sample reviewed.

Based on the top-25 semantic matches examined, the work appears to synthesize existing theoretical insights into a practical width-determination tool, occupying a sparsely populated niche within training-free architecture selection. The analysis does not cover the full breadth of NTK literature or exhaustive architecture search methods, so additional relevant prior work may exist beyond this search scope.

Taxonomy

Core-task Taxonomy Papers
28
3
Claimed Contributions
25
Contribution Candidate Papers Compared
6
Refutable Paper

Research Landscape Overview

Core task: training-free neural network width determination using neural tangent kernel eigenvalues. The field structure reflects a convergence of theoretical insights and practical architecture design. At the highest level, one branch focuses on Training-Free Architecture Selection and Neural Architecture Search, encompassing methods that bypass expensive training cycles by leveraging proxy metrics such as NTK properties or gradient statistics (e.g., NAS ImageNet Four Hours[1], Ultra-fast NAS NTK[23]). A second branch, NTK Eigenvalue Theory and Convergence Analysis, delves into the spectral properties of neural tangent kernels, examining how eigenvalue distributions govern trainability and convergence (e.g., Deformed Semicircle Law[5], Smallest Eigenvalue Bounds[19]). A third branch, NTK-Based Generalization and Optimization Analysis, connects kernel spectra to generalization guarantees and optimization dynamics (e.g., NTK Eigenvalues Generalization[13], Wide Networks Generalization[6]). Finally, Theoretical Frameworks for Wide Network Learning explores the infinite-width regime and its implications for learning theory (e.g., Infinite-Width NTKs[2], Infinite Attention NNGP[3]). Within this landscape, particularly active lines of work explore how spectral measures can guide architecture choices without full training. Training-Free Network Width[0] sits naturally within the Training-Free Architecture Selection branch, specifically addressing width and sparsity determination by analyzing NTK eigenvalues to predict optimal network capacity. This approach contrasts with nearby efforts such as Phew Sparse Networks[14], which targets sparsity patterns, and Graphon Limit Pruning[17], which employs graphon theory for pruning decisions. While many studies in the NTK Eigenvalue Theory branch focus on asymptotic convergence guarantees or spectral laws in the infinite-width limit, Training-Free Network Width[0] emphasizes a practical, finite-width setting where eigenvalue diagnostics directly inform architectural hyperparameters. This positioning highlights an ongoing tension between rigorous theoretical characterizations of wide networks and the need for computationally efficient, training-free heuristics that practitioners can deploy at scale.

Claimed Contributions

Infinite-width theory linking test error to smallest NTK eigenvalue

The authors establish a theoretical upper bound on test error for infinite-width neural networks that is proportional to the inverse square of the smallest eigenvalue of the Neural Tangent Kernel, analyzed through kernel ridgeless regression.

10 retrieved papers
Can Refute
Finite-width theory extending smallest eigenvalue bound to practical networks

The authors extend their theoretical analysis to finite-width networks, demonstrating that the test error upper bound remains controlled by the inverse square of the smallest eigenvalue of the empirical NTK at initialization.

10 retrieved papers
Can Refute
Training-free width selection method based on NTK eigenvalue saturation

The authors introduce a practical algorithm that identifies the cardinal width (where generalization performance saturates) by monitoring when the smallest eigenvalue of the NTK at initialization stops growing, enabling width determination without training.

5 retrieved papers
Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Infinite-width theory linking test error to smallest NTK eigenvalue

The authors establish a theoretical upper bound on test error for infinite-width neural networks that is proportional to the inverse square of the smallest eigenvalue of the Neural Tangent Kernel, analyzed through kernel ridgeless regression.

Contribution

Finite-width theory extending smallest eigenvalue bound to practical networks

The authors extend their theoretical analysis to finite-width networks, demonstrating that the test error upper bound remains controlled by the inverse square of the smallest eigenvalue of the empirical NTK at initialization.

Contribution

Training-free width selection method based on NTK eigenvalue saturation

The authors introduce a practical algorithm that identifies the cardinal width (where generalization performance saturates) by monitoring when the smallest eigenvalue of the NTK at initialization stops growing, enabling width determination without training.