Training Overparametrized Neural Networks in Sublinear Time
Overview
Overall Novelty Assessment
The paper proposes a novel training method achieving m^(1-α)nd + n³ amortized time per iteration for overparametrized neural networks, improving upon the mnd + n³ complexity of prior work. It resides in the 'Near-Linear and Sublinear Second-Order Methods' leaf, which contains only one sibling paper. This leaf sits within the broader 'Sublinear-Time Training Algorithms' branch, which encompasses three distinct approaches: second-order methods, preprocessing-based acceleration, and sketching/sampling techniques. The sparse population of this specific leaf suggests the paper targets a relatively focused research direction within the larger field of training acceleration.
The taxonomy reveals neighboring research directions that contextualize this work. Sibling leaves include 'Preprocessing-Based Acceleration' and 'Sketching and Sampling for Scalable Training', which pursue sublinear complexity through different mechanisms—preprocessing network structure versus randomized dimensionality reduction. The broader 'Convergence Analysis' branch addresses theoretical guarantees for first-order methods and reinforcement learning settings, while 'Efficient Training Infrastructure' focuses on distributed systems and network pruning. The paper's binary search tree representation appears to bridge algorithmic innovation with data structure design, potentially connecting to complexity analysis work in the infrastructure branch.
Among 21 candidates examined, the sublinear-time training algorithm contribution shows substantial prior work overlap, with 3 of 10 examined candidates appearing refutable. The binary search tree representation contribution examined 10 candidates with none clearly refuting it, suggesting greater novelty in this conceptual framing. The lower bound contribution examined only 1 candidate, which appears refutable, though the limited search scope makes definitive assessment difficult. The statistics indicate that while the core algorithmic speedup builds on established second-order method research, the data structure perspective may offer a more distinctive angle within the examined literature.
Based on the top-21 semantic matches and citation expansion, the work appears to advance an active but not densely populated research direction. The taxonomy structure shows the field has diversified into multiple acceleration strategies, and this paper's position in a sparsely populated leaf suggests it pursues a specific technical approach rather than entering a crowded subfield. The limited search scope means potentially relevant work in adjacent areas—such as preprocessing methods or sketching techniques—may not have been fully captured in the contribution-level analysis.
Taxonomy
Research Landscape Overview
Claimed Contributions
The authors propose a training algorithm for two-layer ReLU neural networks with m = poly(n) parameters that achieves sublinear cost-per-iteration of m^(1-α)nd + n^3, improving over the previous linear-time O(mnd + n^3) per iteration while maintaining the same fast convergence rate as second-order methods.
The authors introduce a novel representation of neural networks as collections of binary search trees, where the relationship between network weights and training data is encoded using trees. This view enables efficient training by exploiting sparsity of activations to update only a small subset of tree nodes per iteration.
The authors establish a conditional lower bound showing that under the Strong Exponential Time Hypothesis or Orthogonal Vector Conjecture, their dynamic sensitive weight searching data structure cannot be substantially improved, demonstrating near-optimality of their approach.
Core Task Comparisons
Comparisons with papers in the same taxonomy category
[1] Training (Overparametrized) Neural Networks in Near-Linear Time PDF
Contribution Analysis
Detailed comparisons for each claimed contribution
Sublinear-time training algorithm for overparametrized neural networks
The authors propose a training algorithm for two-layer ReLU neural networks with m = poly(n) parameters that achieves sublinear cost-per-iteration of m^(1-α)nd + n^3, improving over the previous linear-time O(mnd + n^3) per iteration while maintaining the same fast convergence rate as second-order methods.
[1] Training (Overparametrized) Neural Networks in Near-Linear Time PDF
[3] Does preprocessing help training over-parameterized neural networks? PDF
[17] Bypass exponential time preprocessing: Fast neural network training via weight-data correlation preprocessing PDF
[4] A dynamical view on optimization algorithms of overparameterized neural networks PDF
[8] Neural temporal-difference learning converges to global optima PDF
[9] Neural trust region/proximal policy optimization attains globally optimal policy PDF
[15] Fast convergence of natural gradient descent for over-parameterized neural networks PDF
[16] Train faster, perform better: modular adaptive training in over-parameterized models PDF
[18] Efficient sgd neural network training via sublinear activated neuron identification PDF
[19] Minnorm training: an algorithm for training over-parameterized deep neural networks PDF
Binary search tree representation of neural networks
The authors introduce a novel representation of neural networks as collections of binary search trees, where the relationship between network weights and training data is encoded using trees. This view enables efficient training by exploiting sparsity of activations to update only a small subset of tree nodes per iteration.
[20] Tree-NeRV: A Tree-Structured Neural Representation for Efficient Non-Uniform Video Encoding PDF
[21] BT-CNN: a balanced binary tree architecture for classification of brain tumour using MRI imaging PDF
[22] Attention convolutional binary neural tree for fine-grained visual categorization PDF
[23] Rewiring networks for graph neural network training using discrete geometry PDF
[24] Adaptive neural trees PDF
[25] Alternating Multi-bit Quantization for Recurrent Neural Networks PDF
[26] Classification trees with neural network feature extraction PDF
[27] Routing Strategies for Isochronal-Evolution Random Matching Network PDF
[28] HBST: A hamming distance embedding binary search tree for feature-based visual place recognition PDF
[29] Learning binary decision trees by argmin differentiation PDF
Lower bound for dynamic sensitive weight searching data structure
The authors establish a conditional lower bound showing that under the Strong Exponential Time Hypothesis or Orthogonal Vector Conjecture, their dynamic sensitive weight searching data structure cannot be substantially improved, demonstrating near-optimality of their approach.