Mean Estimation from Coarse Data: Characterizations and Efficient Algorithms

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.5 Download Report PDF

high dimensional statisticsalgorithmic statisticscomputational learning theorycoarse observationsmean estimationlinear regressionfriction

Coarse data arise when learners observe only partial information about samples; namely, a set containing the sample rather than its exact value. This occurs naturally through measurement rounding, sensor limitations, and lag in economic systems. We study Gaussian mean estimation from coarse data, where each true sample $x$ is drawn from a $d$ -dimensional Gaussian distribution with identity covariance, but is revealed only through the set of a partition containing $x$ . When the coarse samples, roughly speaking, have ``low'' information, the mean cannot be uniquely recovered from observed samples (i.e., the problem is not identifiable). Recent work by Fotakis et al. (2021) established that sample-efficient mean estimation is possible when the unknown mean is identifiable and the partition consists of only convex sets. Moreover, they showed that without convexity, mean estimation becomes NP-hard. However, two fundamental questions remained open:

When is the mean identifiable under convex partitions?
Is computationally efficient estimation possible under identifiability and convex partitions?

This work resolves both questions. We provide a geometric characterization of when a convex partition is identifiable, showing it depends on whether the convex sets form ``slabs'' in a direction. Second, we give the first polynomial-time algorithm for finding $\varepsilon$ -accurate estimates of the Gaussian mean given coarse samples from an unknown convex partition, matching the optimal $\widetilde{O}(d/\varepsilon^2)$ sample complexity. Our results have direct applications to robust machine learning, particularly robustness to observation rounding. As a concrete example, we derive a sample- and computationally- efficient algorithm for linear regression with market friction, a canonical problem in using ML in economics, where exact prices are unobserved and one only sees a range containing the price (Rosett, 1959).

Abstract:

When is the mean identifiable under convex partitions?
Is computationally efficient estimation possible under identifiability and convex partitions?

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper addresses Gaussian mean estimation when observations are revealed only through coarse partitions rather than exact values. It sits in the 'Coarse Partition-Based Mean Estimation' leaf, which contains only two papers total (including the original work). This represents a relatively sparse research direction within the broader taxonomy of 36 papers across quantization, distributed estimation, and functional data methods. The sibling paper focuses on particle filtering applications, suggesting limited prior work directly tackling the theoretical foundations of convex partition-based estimation.

The taxonomy reveals that neighboring branches emphasize different observation models: 'Quantization-Based Mean Estimation' (13 papers) studies discrete-level mappings with known quantizer designs, while 'Distributed Mean Estimation' (4 papers) addresses communication-constrained protocols across multiple agents. The original work diverges by considering arbitrary convex partitions without assuming structured quantization or distributed settings. The scope note for the leaf explicitly excludes point quantization and distributed protocols, positioning this work at the intersection of geometric constraints (convexity) and identifiability theory rather than communication or bit-budget optimization.

Among 22 candidates examined, the contribution on polynomial-time algorithms shows overlap with 2 refutable candidates from 10 examined, suggesting some prior computational work exists in related settings. The geometric identifiability characterization examined 2 candidates with no refutations, indicating this theoretical angle may be less explored. The linear regression extension examined 10 candidates without refutations, pointing to potential novelty in applying coarse-data frameworks to regression. The limited search scope (top-K semantic matches) means these statistics reflect nearby literature rather than exhaustive coverage of all estimation theory.

Given the sparse leaf structure and the limited 22-candidate search, the work appears to occupy a relatively underexplored niche bridging partition geometry and statistical identifiability. The computational contribution faces some prior overlap, while the identifiability characterization and regression application show fewer direct precedents among examined papers. A broader search beyond semantic neighbors might reveal additional connections to convex geometry or interval-censored data literature not captured here.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Gaussian mean estimation from coarse data. This field addresses the challenge of estimating the mean of a Gaussian distribution when observations are degraded by quantization, partitioning, or communication constraints. The taxonomy reveals several main branches that reflect different facets of this problem. Quantization-Based Mean Estimation focuses on scenarios where continuous observations are mapped to discrete levels, often exploring optimal quantizer design and the resulting estimation error, as seen in works like Quantized Gaussian Dither[9] and Optimal Scalar Quantization[35]. Distributed Mean Estimation Under Communication Constraints examines settings where multiple agents must collaborate under bandwidth limitations, with studies such as Quantized Message Passing[3] and Geometric Communication Bounds[34] addressing trade-offs between communication cost and accuracy. Coarse Partition-Based Mean Estimation considers data that arrive in broad bins or intervals rather than precise quantization levels. Meanwhile, Continuous and Functional Mean Estimation extends the problem to infinite-dimensional or functional data, Robust and High-Dimensional Mean Estimation tackles outliers and scaling challenges, and Application-Specific Mean Estimation applies these ideas to domains like viral load tracking or service metrics. Several active lines of work highlight contrasting emphases and open questions. One thread investigates the interplay between quantization granularity and estimation performance, with papers like MMSE Quantized Nonasymptotic[33] and One-Bit Mean Estimation[7] pushing the limits of minimal-bit regimes. Another explores adaptive and data-dependent quantizers, as in Adaptive Quantizers[36] and Quantization Adaptation Estimation[32], which adjust binning strategies based on observed statistics. The original paper, Coarse Mean Estimation[0], sits within the Coarse Partition-Based branch and shares thematic ground with Bernoulli Particle Filter[20], both addressing estimation from relatively crude partitions of the observation space. Compared to the fine-grained quantization focus of One-Bit Mean Estimation[7] or the communication-centric perspective of Quantized Message Passing[3], Coarse Mean Estimation[0] emphasizes scenarios where data are inherently grouped into coarse bins, raising questions about optimal partition design and the fundamental limits of inference under such constraints.

Claimed Contributions

Geometric characterization of identifiability for convex partitions

2 retrieved papers

The authors characterize exactly when Gaussian mean estimation from coarse data is identifiable under convex partitions. They prove that a convex partition is non-identifiable if and only if almost all sets in the partition are parallel slabs in some direction, combining tools from optimal transport and variance reduction inequalities.

2 retrieved papers

First polynomial-time algorithm for coarse Gaussian mean estimation

Can Refute

10 retrieved papers

The authors provide the first computationally efficient algorithm for estimating the Gaussian mean from coarse samples under identifiable convex partitions. The algorithm runs in polynomial time and achieves the same sample complexity as prior sample-efficient but computationally inefficient methods.

10 retrieved papers

Can Refute

Sample- and computationally-efficient algorithm for linear regression with market friction

10 retrieved papers

As a concrete application, the authors develop an efficient algorithm for linear regression with market friction, where exact prices are unobserved and only ranges containing prices are available. This addresses a canonical economics problem dating back to Rosett (1959).

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[20] Gaussian likelihood based Bernoulli particle filter for non-uniformly quantized interval measurement PDF

Xiaoxue Feng, Feng Pan, Qi Gao, Weixing Li (2016)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Geometric characterization of identifiability for convex partitions

[15] Can SGD Select Good Fishermen? Local Convergence under Self-Selection Biases and Beyond PDF

Cannot Refute

[37] Applications of geometry in optimization and statistical estimation PDF

Cannot Refute

Contribution

First polynomial-time algorithm for coarse Gaussian mean estimation

[15] Can SGD Select Good Fishermen? Local Convergence under Self-Selection Biases and Beyond PDF

Can Refute

[39] Efficient Algorithms for Learning from Coarse Labels PDF

Can Refute

[38] Optimal Sub-Gaussian Mean Estimation in PDF

Cannot Refute

[40] Privately estimating a gaussian: Efficient, robust, and optimal PDF

Cannot Refute

[41] Private Robust Estimation by Stabilizing Convex Relaxations PDF

Cannot Refute

[42] Computational and statistical tradeoffs via convex relaxation PDF

Cannot Refute

[43] Optimal robust mean and location estimation via convex programs with respect to any pseudo-norms PDF

Cannot Refute

[44] All-in-one robust estimator of the Gaussian mean PDF

Cannot Refute

[45] Algorithm Design for Reliable Machine Learning PDF

Cannot Refute

[46] Statistical Query Algorithms for Mean Vector Estimation and Stochastic Convex Optimization PDF

Cannot Refute

Contribution

Sample- and computationally-efficient algorithm for linear regression with market friction

[47] Employing gaussian process regression with bayesian inference to predict the living-materials producer price index in china PDF

Cannot Refute

[48] Risk Premia Reconsidered: Illiquidity and Selection Bias for Stocks and Corporate Bonds in the pre-World War I period PDF

Cannot Refute

[49] Disentangling sources of high frequency market microstructure noise PDF

Cannot Refute

[50] The Impact of Electronic Banking Products on Economic Growth: The case of Kosovo PDF

Cannot Refute

[51] Linear panel regressions with two-way unobserved heterogeneity PDF

Cannot Refute

[52] The effect of financial reporting quality, debt maturity, political connection, and corporate governance on investment efficiency: Evidence from Indonesia PDF

Cannot Refute

[53] Linear regression for currency European call option pricing in incomplete markets PDF

Cannot Refute

[54] A Gibbs sampling approach to the estimation of linear regression models under daily price limits PDF

Cannot Refute

[55] A relation between the efficient, transaction and mid prices: Disentangling sources of high frequency market microstructure noise PDF

Cannot Refute

[56] Analysis of high frequency financial data PDF

Cannot Refute

Mean Estimation from Coarse Data: Characterizations and Efficient Algorithms

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[20] Gaussian likelihood based Bernoulli particle filter for non-uniformly quantized interval measurement PDF

Contribution Analysis

Geometric characterization of identifiability for convex partitions

[15] Can SGD Select Good Fishermen? Local Convergence under Self-Selection Biases and Beyond PDF

[37] Applications of geometry in optimization and statistical estimation PDF

First polynomial-time algorithm for coarse Gaussian mean estimation

[15] Can SGD Select Good Fishermen? Local Convergence under Self-Selection Biases and Beyond PDF

[39] Efficient Algorithms for Learning from Coarse Labels PDF

[38] Optimal Sub-Gaussian Mean Estimation in PDF

[40] Privately estimating a gaussian: Efficient, robust, and optimal PDF

[41] Private Robust Estimation by Stabilizing Convex Relaxations PDF

[42] Computational and statistical tradeoffs via convex relaxation PDF

[43] Optimal robust mean and location estimation via convex programs with respect to any pseudo-norms PDF

[44] All-in-one robust estimator of the Gaussian mean PDF

[45] Algorithm Design for Reliable Machine Learning PDF

[46] Statistical Query Algorithms for Mean Vector Estimation and Stochastic Convex Optimization PDF

Sample- and computationally-efficient algorithm for linear regression with market friction

[47] Employing gaussian process regression with bayesian inference to predict the living-materials producer price index in china PDF

[48] Risk Premia Reconsidered: Illiquidity and Selection Bias for Stocks and Corporate Bonds in the pre-World War I period PDF

[49] Disentangling sources of high frequency market microstructure noise PDF

[50] The Impact of Electronic Banking Products on Economic Growth: The case of Kosovo PDF

[51] Linear panel regressions with two-way unobserved heterogeneity PDF

[52] The effect of financial reporting quality, debt maturity, political connection, and corporate governance on investment efficiency: Evidence from Indonesia PDF

[53] Linear regression for currency European call option pricing in incomplete markets PDF

[54] A Gibbs sampling approach to the estimation of linear regression models under daily price limits PDF

[55] A relation between the efficient, transaction and mid prices: Disentangling sources of high frequency market microstructure noise PDF

[56] Analysis of high frequency financial data PDF

Table of Contents