A Recovery Guarantee for Sparse Neural Networks

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.4 Download Report PDF

compressed sensingneural networksmodel pruningsparse weight recovery

We prove the first guarantees of sparse recovery for ReLU neural networks, where the sparse network weights constitute the signal to be recovered. Specifically, we study structural properties of the sparse network weights for two-layer, scalar-output networks under which a simple iterative hard thresholding algorithm recovers these weights exactly, using memory that grows linearly in the number of nonzero weights. We validate this theoretical result with simple experiments on recovery of sparse planted MLPs, MNIST classification, and implicit neural representations. Experimentally, we find performance that is competitive with, and often exceeds, a high-performing but memory-inefficient baseline based on iterative magnitude pruning.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper establishes the first provable guarantees for recovering sparse weights in two-layer ReLU networks using iterative hard thresholding. It resides in the 'Theoretical Recovery Guarantees' leaf, which contains only four papers total, indicating a relatively sparse research direction within the broader taxonomy of 50 papers. This leaf focuses specifically on provable algorithms for exact or approximate weight recovery, distinguishing it from empirical pruning methods and training-time sparsification techniques that dominate other branches of the field.

The taxonomy reveals that neighboring work clusters around compressed sensing with generative models, sparse signal recovery applications, and training-induced sparsity methods. The paper's theoretical recovery focus diverges from the more crowded pruning and compression branches, which emphasize post-training interventions without recovery guarantees. Its sibling papers in the same leaf address one-bit compressed sensing variants and learning guarantees for shallow networks, suggesting the paper extends classical compressed sensing ideas to ReLU network weight recovery rather than exploring activation sparsity or architectural design.

Among 30 candidates examined, the contribution on structural properties enabling IHT recovery shows one refutable candidate, while the other two contributions—first recovery guarantee and memory-efficient IHT algorithm—show no clear refutations across 10 candidates each. This limited search scope suggests the core recovery guarantee appears relatively novel within the examined literature, though the structural conditions may overlap with existing compressed sensing or shallow network learning theory. The memory efficiency claim faces no direct prior work among the candidates reviewed.

Based on the top-30 semantic matches and citation expansion, the work appears to occupy a sparsely populated niche at the intersection of compressed sensing theory and neural network weight recovery. The analysis does not cover exhaustive literature on general sparse optimization or broader pruning methods, which may contain relevant but less directly comparable prior work. The taxonomy structure confirms this sits in a less crowded theoretical corner compared to empirical pruning or training-time regularization branches.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: sparse recovery for ReLU neural network weights. The field encompasses a diverse set of approaches to understanding and exploiting sparsity in neural networks with ReLU activations. At the highest level, the taxonomy divides into branches addressing sparse weight recovery and reconstruction (often with theoretical guarantees), training-induced sparsity and regularization methods, pruning and compression techniques for model efficiency, activation sparsity exploitation during inference, specialized sparse network architectures, theoretical analysis of generalization and expressiveness, optimization algorithms that encourage or preserve sparsity, and applications in specialized domains. Some branches, such as pruning and compression, focus on post-training or training-time interventions to reduce model size (e.g., Extreme Pruning Tricks[15]), while others like sparse weight recovery emphasize the mathematical problem of reconstructing network parameters from limited observations (e.g., Robust One-bit Recovery[21], Improved One-bit Recovery[35]). Training-induced sparsity methods (e.g., Adam Implicit Sparsity[25], Hidden Synergy L1[12]) explore how optimization dynamics naturally lead to sparse solutions, whereas activation sparsity exploitation (e.g., Activation Sparsity Inference[7], SparseInfer Training-free[46]) leverages the zero outputs of ReLU units to accelerate inference. A particularly active line of work centers on theoretical recovery guarantees, examining when and how sparse ReLU network weights can be provably reconstructed from data or measurements. This includes studies on one-bit compressed sensing variants (Robust One-bit Recovery[21], Improved One-bit Recovery[35]) and learning guarantees for shallow networks (Learning ReLU Model[40]). Sparse Neural Recovery[0] sits squarely within this theoretical recovery branch, contributing rigorous analysis of conditions under which sparse weight vectors can be identified. Its emphasis on provable reconstruction contrasts with more heuristic pruning approaches (Extreme Pruning Tricks[15]) and complements work on implicit regularization during training (Adam Implicit Sparsity[25]). Compared to nearby theoretical studies like Robust One-bit Recovery[21] and Learning ReLU Model[40], Sparse Neural Recovery[0] appears to focus on extending recovery guarantees to more general measurement models or network configurations, bridging classical compressed sensing ideas with the unique structure imposed by ReLU nonlinearities.

Claimed Contributions

First sparse recovery guarantee for ReLU neural network weights

10 retrieved papers

The authors establish theoretical conditions under which sparse weights of two-layer scalar-output ReLU networks can be uniquely identified and efficiently recovered using iterative hard thresholding (IHT). This includes both identifiability guarantees and convergence guarantees for the recovery algorithm under random Gaussian data.

10 retrieved papers

Structural properties enabling sparse weight recovery via IHT

Can Refute

10 retrieved papers

The authors identify and formalize structural properties (restricted strong convexity and restricted smoothness) of sparse MLP weights that enable exact recovery. They prove these properties hold with high probability for networks satisfying certain sparsity and weight constraints when trained on Gaussian data.

10 retrieved papers

Can Refute

Memory-efficient IHT algorithm for sparse MLP training

10 retrieved papers

The authors develop an iterative hard thresholding algorithm that recovers sparse network weights with memory requirements that scale linearly with sparsity rather than with the full network size. This contrasts with existing methods like iterative magnitude pruning that require training dense networks first.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[21] Robust one-bit recovery via ReLU generative networks: Near-optimal statistical rate and global landscape analysis PDF

Qiu Shuang, Wei Xiao-han, Yang, Zhuoran (2020)

[35] Robust one-bit recovery via relu generative networks: Improved statistical rates and global landscape analysis PDF

Qiu Shuang, Shuang Qiu, Wei Xiao-han, Xiaohan Wei, Yang, Zhuoran, Zhuoran Yang (2019)

[40] Learning and recovery in the ReLU model PDF

Arya Mazumdar, Ankit Singh Rawat, A. Mazumdar, A. Rawat (2019)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

First sparse recovery guarantee for ReLU neural network weights

[4] On Sparsity in Overparametrised Shallow ReLU Networks PDF

Cannot Refute

[5] ReLU Strikes Back: Exploiting Activation Sparsity in Large Language Models PDF

Cannot Refute

[7] Inducing and exploiting activation sparsity for fast inference on deep neural networks PDF

Cannot Refute

[15] Pushing the Limits of Sparsity: A Bag of Tricks for Extreme Pruning PDF

Cannot Refute

[30] Overparameterized ReLU Neural Networks Learn the Simplest Model: Neural Isometry and Phase Transitions PDF

Cannot Refute

[71] Algorithmic and theoretical aspects of sparse deep neural networks PDF

Cannot Refute

[72] Automatic sparse connectivity learning for neural networks PDF

Cannot Refute

[73] Efficient Sparse-Winograd Convolutional Neural Networks PDF

Cannot Refute

[74] Trajectory growth lower bounds for random sparse deep ReLU networks PDF

Cannot Refute

[75] Norm-based Generalization Bounds for Compositionally Sparse Neural Networks PDF

Cannot Refute

Contribution

Structural properties enabling sparse weight recovery via IHT

[64] On Iterative Hard Thresholding Methods for High-dimensional M-Estimation PDF

Can Refute

[61] Structured Sparse Regression via Greedy Hard Thresholding PDF

Cannot Refute

[62] Learning Sparse Distributions using Iterative Hard Thresholding PDF

Cannot Refute

[63] Deep greedy unfolding: Sorting out argsorting in greedy sparse recovery algorithms PDF

Cannot Refute

[65] Nonlinear structured signal estimation in high dimensions via iterative hard thresholding PDF

Cannot Refute

[66] Robust 1-bit Compressed Sensing with Iterative Hard Thresholding PDF

Cannot Refute

[67] Accelerated iterative hard thresholding PDF

Cannot Refute

[68] FISTA-Net: Learning a fast iterative shrinkage thresholding network for inverse problems in imaging PDF

Cannot Refute

[69] Sparse CCA via Precision Adjusted Iterative Thresholding PDF

Cannot Refute

[70] Learning sparse generalized linear models with binary outcomes via iterative hard thresholding PDF

Cannot Refute

Contribution

Memory-efficient IHT algorithm for sparse MLP training

[51] Sparse Covariance Neural Networks PDF

Cannot Refute

[52] Harvest: towards efficient sparse DNN accelerators using programmable thresholds PDF

Cannot Refute

[53] Advancing dynamic sparse training by exploring optimization opportunities PDF

Cannot Refute

[54] Gradient Properties of Hard Thresholding Operator PDF

Cannot Refute

[55] Stochastic iterative hard thresholding for graph-structured sparsity optimization PDF

Cannot Refute

[56] Iterative Thresholding and Projection Algorithms and Model-Based Deep Neural Networks for Sparse LQR Control Design PDF

Cannot Refute

[57] Efficient stochastic gradient hard thresholding PDF

Cannot Refute

[58] Stochastic Variance-Reduced Iterative Hard Thresholding in Graph Sparsity Optimization PDF

Cannot Refute

[59] Conjugate Gradient Iterative Hard Thresholding for Structured Sparsity PDF

Cannot Refute

[60] Deep neural network structured sparse coding for online processing PDF

Cannot Refute

A Recovery Guarantee for Sparse Neural Networks

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[21] Robust one-bit recovery via ReLU generative networks: Near-optimal statistical rate and global landscape analysis PDF

[35] Robust one-bit recovery via relu generative networks: Improved statistical rates and global landscape analysis PDF

[40] Learning and recovery in the ReLU model PDF

Contribution Analysis

First sparse recovery guarantee for ReLU neural network weights

[4] On Sparsity in Overparametrised Shallow ReLU Networks PDF

[5] ReLU Strikes Back: Exploiting Activation Sparsity in Large Language Models PDF

[7] Inducing and exploiting activation sparsity for fast inference on deep neural networks PDF

[15] Pushing the Limits of Sparsity: A Bag of Tricks for Extreme Pruning PDF

[30] Overparameterized ReLU Neural Networks Learn the Simplest Model: Neural Isometry and Phase Transitions PDF

[71] Algorithmic and theoretical aspects of sparse deep neural networks PDF

[72] Automatic sparse connectivity learning for neural networks PDF

[73] Efficient Sparse-Winograd Convolutional Neural Networks PDF

[74] Trajectory growth lower bounds for random sparse deep ReLU networks PDF

[75] Norm-based Generalization Bounds for Compositionally Sparse Neural Networks PDF

Structural properties enabling sparse weight recovery via IHT

[64] On Iterative Hard Thresholding Methods for High-dimensional M-Estimation PDF

[61] Structured Sparse Regression via Greedy Hard Thresholding PDF

[62] Learning Sparse Distributions using Iterative Hard Thresholding PDF

[63] Deep greedy unfolding: Sorting out argsorting in greedy sparse recovery algorithms PDF

[65] Nonlinear structured signal estimation in high dimensions via iterative hard thresholding PDF

[66] Robust 1-bit Compressed Sensing with Iterative Hard Thresholding PDF

[67] Accelerated iterative hard thresholding PDF

[68] FISTA-Net: Learning a fast iterative shrinkage thresholding network for inverse problems in imaging PDF

[69] Sparse CCA via Precision Adjusted Iterative Thresholding PDF

[70] Learning sparse generalized linear models with binary outcomes via iterative hard thresholding PDF

Memory-efficient IHT algorithm for sparse MLP training

[51] Sparse Covariance Neural Networks PDF

[52] Harvest: towards efficient sparse DNN accelerators using programmable thresholds PDF

[53] Advancing dynamic sparse training by exploring optimization opportunities PDF

[54] Gradient Properties of Hard Thresholding Operator PDF

[55] Stochastic iterative hard thresholding for graph-structured sparsity optimization PDF

[56] Iterative Thresholding and Projection Algorithms and Model-Based Deep Neural Networks for Sparse LQR Control Design PDF

[57] Efficient stochastic gradient hard thresholding PDF

[58] Stochastic Variance-Reduced Iterative Hard Thresholding in Graph Sparsity Optimization PDF

[59] Conjugate Gradient Iterative Hard Thresholding for Structured Sparsity PDF

[60] Deep neural network structured sparse coding for online processing PDF

Table of Contents