Training-Free Determination of Network Width via Neural Tangent Kernel
Overview
Overall Novelty Assessment
The paper proposes a training-free method to determine optimal neural network width by analyzing the smallest eigenvalue of the Neural Tangent Kernel at initialization. It resides in the 'Training-Free Width and Sparsity Determination' leaf, which contains only three papers total, indicating a relatively sparse research direction within the broader taxonomy. This leaf sits under 'Training-Free Architecture Selection and Neural Architecture Search', distinguishing it from purely theoretical NTK studies and general NAS frameworks that rank diverse architectures rather than specifically sizing network width.
The taxonomy reveals neighboring work in 'Training-Free NAS with NTK Metrics' (five papers on architecture ranking) and extensive theoretical branches analyzing NTK eigenvalue bounds and convergence properties. The paper bridges practical architecture selection and theoretical eigenvalue analysis, drawing on insights from 'Finite-Width NTK Eigenvalue Bounds' while targeting a different application domain. Related work on sparsity determination (e.g., Phew Sparse Networks) addresses connectivity patterns rather than width, and general NAS methods optimize over broader search spaces without focusing on the width-saturation phenomenon this paper targets.
Among 25 candidates examined, all three contributions show some prior work overlap. The infinite-width theory linking test error to smallest eigenvalue examined 10 candidates with 3 refutable matches; the finite-width extension examined 10 candidates with 2 refutable matches; the training-free width selection method examined 5 candidates with 1 refutable match. These statistics suggest that while the specific combination of contributions may be novel, individual theoretical components and the general idea of using NTK eigenvalues for architecture decisions have precedent in the limited literature sample reviewed.
Based on the top-25 semantic matches examined, the work appears to synthesize existing theoretical insights into a practical width-determination tool, occupying a sparsely populated niche within training-free architecture selection. The analysis does not cover the full breadth of NTK literature or exhaustive architecture search methods, so additional relevant prior work may exist beyond this search scope.
Taxonomy
Research Landscape Overview
Claimed Contributions
The authors establish a theoretical upper bound on test error for infinite-width neural networks that is proportional to the inverse square of the smallest eigenvalue of the Neural Tangent Kernel, analyzed through kernel ridgeless regression.
The authors extend their theoretical analysis to finite-width networks, demonstrating that the test error upper bound remains controlled by the inverse square of the smallest eigenvalue of the empirical NTK at initialization.
The authors introduce a practical algorithm that identifies the cardinal width (where generalization performance saturates) by monitoring when the smallest eigenvalue of the NTK at initialization stops growing, enabling width determination without training.
Core Task Comparisons
Comparisons with papers in the same taxonomy category
[14] Phew: Constructing sparse networks that learn fast and generalize well without training data PDF
[17] The Graphon Limit Hypothesis: Understanding Neural Network Pruning via Infinite Width Analysis PDF
Contribution Analysis
Detailed comparisons for each claimed contribution
Infinite-width theory linking test error to smallest NTK eigenvalue
The authors establish a theoretical upper bound on test error for infinite-width neural networks that is proportional to the inverse square of the smallest eigenvalue of the Neural Tangent Kernel, analyzed through kernel ridgeless regression.
[15] Generalization Properties of NAS under Activation and Skip Connection Search PDF
[19] Tight Bounds on the Smallest Eigenvalue of the Neural Tangent Kernel for Deep ReLU Networks PDF
[35] Disentangling trainability and generalization in deep neural networks PDF
[5] Deformed semicircle law and concentration of nonlinear random matrices for ultra-wide neural networks PDF
[18] Memorization and Optimization in Deep Neural Networks with Minimum Over-parameterization PDF
[32] Mathematical Foundations of Neural Tangents and Infinite-Width Networks PDF
[33] Spectrum dependent learning curves in kernel regression and wide neural networks PDF
[34] How Learnable Grids Recover Fine Detail in Low Dimensions: A Neural Tangent Kernel Analysis of Multigrid Parametric Encodings PDF
[36] How does a kernel based on gradients of infinite-width neural networks come to be widely used: a review of the neural tangent kernel PDF
[37] Benign overfitting in deep neural networks under lazy training PDF
Finite-width theory extending smallest eigenvalue bound to practical networks
The authors extend their theoretical analysis to finite-width networks, demonstrating that the test error upper bound remains controlled by the inverse square of the smallest eigenvalue of the empirical NTK at initialization.
[15] Generalization Properties of NAS under Activation and Skip Connection Search PDF
[38] The interpolation phase transition in neural networks: Memorization and generalization under lazy training PDF
[18] Memorization and Optimization in Deep Neural Networks with Minimum Over-parameterization PDF
[36] How does a kernel based on gradients of infinite-width neural networks come to be widely used: a review of the neural tangent kernel PDF
[39] -ReQ : Assessing Representation Quality in Self-Supervised Learning by measuring eigenspectrum decay PDF
[40] Non-Asymptotic Optimization and Generalization Bounds for Stochastic Gauss-Newton in Overparameterized Models PDF
[41] The Three Paradigms of Physics-Informed Learning: Neural Networks (PINNs), Neural Operators (PINOs), and Reinforcement Learning (PIRL) PDF
[42] Distributed PCA-based anomaly detection in wireless sensor networks PDF
[43] On a Mathematical Understanding of Deep Neural Networks PDF
[44] Focusing of pulsed neutrons by traveling magnetic potentials PDF
Training-free width selection method based on NTK eigenvalue saturation
The authors introduce a practical algorithm that identifies the cardinal width (where generalization performance saturates) by monitoring when the smallest eigenvalue of the NTK at initialization stops growing, enabling width determination without training.