Adaptive Width Neural Networks

ICLR 2026 Conference SubmissionAnonymous Authors
Neural NetworksLearning the Number of NeuronsAdaptive Width LearningDynamic ArchitecturesInformation CompressionVariational Inference
Abstract:

For almost 70 years, researchers have typically selected the width of neural networks’ layers either manually or through automated hyperparameter tuning methods such as grid search and, more recently, neural architecture search. This paper challenges the status quo by introducing an easy-to-use technique to learn an \textit{unbounded} width of a neural network's layer \textit{during training}. The method jointly optimizes the width and the parameters of each layer via standard backpropagation. We apply the technique to a broad range of data domains such as tables, images, text, sequences, and graphs, showing how the width adapts to the task's difficulty. A by product of our width learning approach is the easy truncation of the trained network at virtually zero cost, achieving a smooth trade-off between performance and compute resources. Alternatively, one can dynamically compress the network until performances do not degrade. In light of recent foundation models trained on large datasets, requiring billions of parameters and where hyper-parameter tuning is unfeasible due to huge training costs, our approach introduces a viable alternative for width learning.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces an Adaptive Width Neural Networks (AWNN) framework that learns layer width during training via gradient-based optimization. It resides in the 'Direct Width Parameterization' leaf of the taxonomy, which contains only three papers total. This leaf sits within the broader 'Dynamic Width Adaptation via Gradient-Based Optimization' branch, indicating a relatively sparse research direction compared to the more crowded discrete growth mechanisms elsewhere in the field. The small sibling set suggests this continuous, differentiable approach to width learning remains less explored than heuristic or error-triggered expansion methods.

The taxonomy reveals neighboring branches that tackle width adaptation through alternative paradigms. Adjacent leaves include 'Gradient-Informed Neuron Addition' (using singular value decomposition for initialization) and 'Functional Steepest Descent for Architecture Growth' (employing second-order optimization in metric spaces). These contrast with the paper's first-order backpropagation approach. The broader 'Dynamic Architecture Evolution via Discrete Growth Mechanisms' branch encompasses error-based neuron addition and reinforcement learning methods, highlighting a fundamental divide between continuous parameterization and discrete expansion rules. The paper's position emphasizes smooth, end-to-end optimization rather than threshold-triggered growth.

Among thirty candidates examined, the AWNN framework contribution shows two refutable candidates out of ten examined, while the soft ordering mechanism found zero refutations across ten candidates. Post-hoc truncation capabilities identified one refutable candidate among ten examined. The limited search scope means these statistics reflect top-K semantic matches rather than exhaustive coverage. The soft ordering contribution appears more distinctive within this sample, whereas the core framework and truncation features encounter some overlapping prior work. The relatively small candidate pool and sparse taxonomy leaf suggest the analysis captures a focused but not comprehensive view of related literature.

Given the limited thirty-candidate search and the sparse three-paper taxonomy leaf, the work appears to occupy a less-crowded niche within gradient-based width learning. The analysis does not cover the full breadth of neural architecture search or pruning literature, focusing instead on methods explicitly addressing unbounded or dynamic width. The contribution-level statistics indicate moderate novelty for the core framework and truncation, with the soft ordering mechanism showing stronger distinctiveness within the examined sample.

Taxonomy

Core-task Taxonomy Papers
47
3
Claimed Contributions
30
Contribution Candidate Papers Compared
3
Refutable Paper

Research Landscape Overview

Core task: learning unbounded neural network layer width during training. The field encompasses a diverse set of strategies for dynamically adjusting network capacity as learning proceeds, rather than fixing architecture size in advance. At the highest level, the taxonomy distinguishes gradient-based optimization approaches that treat width as a continuous or differentiable parameter from discrete growth mechanisms that add neurons or layers through heuristic triggers. Other major branches address adaptive expansion for training efficiency, continual and lifelong learning scenarios where architectures must accommodate new tasks, methods that jointly grow and prune for compactness, self-organizing and competitive learning paradigms inspired by biological plasticity, domain-specific applications, theoretical analyses of training dynamics, auxiliary architectural components, cascade and depth-oriented growth, and ensemble-based strategies. Representative works such as Dynamically Expandable Networks[3] and GradMax[13] illustrate how different branches tackle the challenge of when and how to expand capacity, while methods like Grow and Prune[31] and Online Growing Pruning[14] combine expansion with compression to maintain efficiency. Within this landscape, a particularly active line of research focuses on gradient-based width adaptation, where network width itself becomes a learnable quantity optimized alongside weights. Adaptive Width Networks[0] exemplifies this direct width parameterization approach, treating layer size as a first-class optimization target rather than relying on discrete addition rules. This contrasts with nearby works like Growing Networks Gradient[1] and Learning Network Width[27], which also leverage gradient signals but may differ in how they balance continuous relaxation against discrete architectural decisions or in their handling of training stability during expansion. Another vibrant thread explores dynamic architecture evolution through discrete triggers—methods such as Incrementally Growing Networks[6] and Self Expanding Networks[9] add capacity based on performance metrics or error thresholds. The interplay between these paradigms raises open questions about the trade-offs between smooth, differentiable growth and more abrupt, rule-driven expansion, as well as how to ensure that unbounded width remains computationally tractable and does not lead to overfitting. Adaptive Width Networks[0] sits squarely in the gradient-based optimization branch, emphasizing direct parameterization of width to enable end-to-end learning of architecture size alongside task objectives.

Claimed Contributions

Adaptive Width Neural Networks (AWNN) framework

The authors propose a probabilistic framework that learns the number of neurons in each layer during training through backpropagation, without requiring a fixed upper bound on layer width. This is achieved by maximizing a variational objective (ELBO) over both network parameters and layer widths.

10 retrieved papers
Can Refute
Soft ordering of neurons via monotonically decreasing importance function

The authors introduce a mechanism that rescales neuron activations using a monotonically decreasing importance function (implemented as a discretized exponential distribution). This imposes an ordering where newly added neurons have lower importance, enabling smooth width adaptation and breaking parametrization symmetries.

10 retrieved papers
Post-hoc network truncation and compression capabilities

The soft ordering of neurons enables straightforward post-training compression by removing the least important neurons (last rows/columns of weight matrices), providing a controllable performance-efficiency trade-off without additional training cost. The framework also supports online compression during training via regularization.

10 retrieved papers
Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Adaptive Width Neural Networks (AWNN) framework

The authors propose a probabilistic framework that learns the number of neurons in each layer during training through backpropagation, without requiring a fixed upper bound on layer width. This is achieved by maximizing a variational objective (ELBO) over both network parameters and layer widths.

Contribution

Soft ordering of neurons via monotonically decreasing importance function

The authors introduce a mechanism that rescales neuron activations using a monotonically decreasing importance function (implemented as a discretized exponential distribution). This imposes an ordering where newly added neurons have lower importance, enabling smooth width adaptation and breaking parametrization symmetries.

Contribution

Post-hoc network truncation and compression capabilities

The soft ordering of neurons enables straightforward post-training compression by removing the least important neurons (last rows/columns of weight matrices), providing a controllable performance-efficiency trade-off without additional training cost. The framework also supports online compression during training via regularization.