Price of Quality: Sufficient Conditions for Sparse Recovery using Mixed-Quality Data

ICLR 2026 Conference SubmissionAnonymous Authors
High-dimensional StatisticsMachine learning theorysparse regressionLassoheterogeneous noiseLLM annotations
Abstract:

We study sparse recovery when observations come from mixed-quality sources: a small collection of high-quality measurements with small noise variance and a larger collection of lower-quality measurements with higher variance. For this heterogeneous-noise setting, we establish sample-size conditions for information-theoretic and algorithmic recovery. On the information-theoretic side, we show that (n1,n2)(n_1, n_2) must satisfy a linear trade-off defining the Price of Quality: the number of low-quality samples needed to replace one high-quality sample. In the agnostic setting, where the decoder is completely agnostic to the quality of the data, it is uniformly bounded, and in particular one high-quality sample is never worth more than two low-quality samples. In the informed setting, where the decoder is informed of per-sample variances, the price of quality can grow arbitrarily large. On the algorithmic side, we analyze the LASSO in the agnostic setting and show that the recovery threshold matches the homogeneous-noise case and only depends on the average noise level, revealing a striking robustness of computational recovery to data heterogeneity. Together, these results give the first conditions for sparse recovery with mixed-quality data and expose a fundamental difference between how the information-theoretic and algorithmic thresholds adapt to changes in data quality.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper establishes information-theoretic and algorithmic conditions for sparse recovery when measurements come from mixed-quality sources with heterogeneous noise variances. It resides in the 'Theoretical Foundations and Recovery Guarantees' leaf, which contains only three papers total, indicating a relatively sparse research direction focused on fundamental limits rather than application-specific methods. This leaf sits within the broader 'Sparse Signal Recovery Methods Under Non-Uniform Noise' branch, distinguishing itself from sibling categories addressing direction-of-arrival estimation or graph signal reconstruction by emphasizing rigorous recovery thresholds and sample-complexity trade-offs.

The taxonomy reveals that most neighboring work addresses either application-driven scenarios (DOA estimation with nine papers, imaging with four papers) or alternative noise models (impulsive noise with six papers, quantized sensing with three papers). The paper's theoretical focus contrasts with these domain-tailored approaches. Nearby branches explore non-uniform sampling patterns and structured sparsity, but the scope notes clarify these exclude the heterogeneous-noise variance setting central to this work. The taxonomy structure suggests that foundational theory for mixed-quality data remains less developed than methods for uniform noise or specific application contexts.

Among thirty candidates examined across three contributions, none were identified as clearly refuting the paper's claims. For the sufficient conditions contribution, ten candidates were reviewed with zero refutable matches; the Price of Quality concept similarly showed ten examined and zero refutable; the LASSO extension likewise found no overlapping prior work among ten candidates. This absence of refutation within the limited search scope suggests the specific framing—quantifying sample-size trade-offs between high- and low-quality measurements and analyzing LASSO robustness to data heterogeneity—may represent a novel angle within sparse recovery theory.

Based on the top-thirty semantic matches and taxonomy positioning, the work appears to address an underexplored theoretical gap. The sparse population of its taxonomy leaf and the lack of refutable prior work within the examined candidates indicate potential novelty, though the limited search scope means exhaustive coverage of all relevant literature cannot be claimed. The analysis covers foundational recovery guarantees but does not extend to application-specific validation or algorithmic implementations beyond LASSO.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
30
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: sparse recovery with heterogeneous noise observations. The field addresses the challenge of reconstructing sparse signals when measurements are corrupted by noise that varies in character or intensity across observations. The taxonomy reveals a landscape organized around several complementary perspectives. One major branch focuses on non-uniform noise models, where different measurements may have distinct noise variances or distributions, leading to methods that adaptively weight or reweight observations. Another branch tackles impulsive and heavy-tailed noise, which arises in robust signal processing scenarios where outliers or non-Gaussian disturbances dominate. Quantized and one-bit compressive sensing forms a third pillar, dealing with extreme measurement constraints where observations are coarsely discretized. Additional branches cover non-uniform sampling and reconstruction (e.g., Nonuniform Sparse Fourier[4], Streaming Nonuniform Reconstruction[6]), structured sparsity models that exploit block or group structure (Block Sparse Wavelet[14]), and statistical or Bayesian frameworks that model heterogeneity through hierarchical priors (Deep Heterogeneous Bayesian[13]). Application-specific methods span domains from MRI and NMR (Nonuniform NMR Reconstruction[21], CNN Nonuniform MRI[29]) to direction-of-arrival estimation (Robust Gridless DOA[1], Variational DOA Impulsive[8]) and beyond. Several active lines of work highlight key trade-offs and open questions. Robust methods for impulsive noise (Complementary Priors Impulsive[18], Robust Tensor Impulsive[41]) often employ variational or heavy-tailed likelihood models, contrasting with classical Gaussian assumptions. Quantized sensing approaches (Onebit DOA Sparse[9], Distributed Onebit Decoding[12]) push the limits of information extraction from minimal bit budgets, raising questions about optimal quantizer design under heterogeneous conditions. Meanwhile, non-uniform sampling strategies (Nonuniform Data Reconstruction[11]) and application-driven methods (Compressed Sensing Applications[2], Spectroscopic Compressed Sensing[31]) explore domain-specific structure to improve recovery guarantees. The original paper, Price of Quality[0], sits within the theoretical foundations branch of non-uniform noise recovery, examining fundamental limits and recovery guarantees when observation quality varies. Its emphasis on theoretical characterization complements nearby works such as Nonnegative Heterogeneous Recovery[40], which incorporates additional structural constraints, and contrasts with more application-oriented studies like Sonar Denoising Deep[3] that leverage domain-specific architectures. This positioning underscores a broader tension between deriving rigorous performance bounds and designing practical algorithms for diverse noise regimes.

Claimed Contributions

Sufficient conditions for sparse recovery with mixed-quality data

The authors derive sufficient conditions on sample sizes (n1, n2) for recovering sparse signals when observations come from two sources with different noise variances. They analyze both information-theoretic recovery (via maximum likelihood) and algorithmic recovery (via LASSO) in agnostic and informed settings.

10 retrieved papers
Price of Quality: quantifying the trade-off between high-quality and low-quality samples

The authors introduce and quantify the Price of Quality, which measures how many low-quality samples are needed to replace one high-quality sample. They show this price is uniformly bounded (at most two) in the agnostic setting but can grow arbitrarily large in the informed setting.

10 retrieved papers
Extension of LASSO recovery conditions to heterogeneous-noise setting

The authors prove that the LASSO recovery threshold in the heterogeneous-noise agnostic setting matches the homogeneous-noise case and depends only on the average noise level. This reveals that high-quality and low-quality data contribute equally to reaching the algorithmic threshold, showing robustness of computational recovery to data heterogeneity.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Sufficient conditions for sparse recovery with mixed-quality data

The authors derive sufficient conditions on sample sizes (n1, n2) for recovering sparse signals when observations come from two sources with different noise variances. They analyze both information-theoretic recovery (via maximum likelihood) and algorithmic recovery (via LASSO) in agnostic and informed settings.

Contribution

Price of Quality: quantifying the trade-off between high-quality and low-quality samples

The authors introduce and quantify the Price of Quality, which measures how many low-quality samples are needed to replace one high-quality sample. They show this price is uniformly bounded (at most two) in the agnostic setting but can grow arbitrarily large in the informed setting.

Contribution

Extension of LASSO recovery conditions to heterogeneous-noise setting

The authors prove that the LASSO recovery threshold in the heterogeneous-noise agnostic setting matches the homogeneous-noise case and depends only on the average noise level. This reveals that high-quality and low-quality data contribute equally to reaching the algorithmic threshold, showing robustness of computational recovery to data heterogeneity.