Differentially Private Domain Discovery

ICLR 2026 Conference SubmissionAnonymous Authors
Differential PrivacyPartition SelectionTop-k Selection
Abstract:

We study several problems in differentially private domain discovery, where each user holds a subset of items from a shared but unknown domain, and the goal is to output an informative subset of items. For set union, we show that the simple baseline Weighted Gaussian Mechanism (WGM) has a near-optimal 1\ell_1 missing mass guarantee on Zipfian data as well as a distribution-free \ell_\infty missing mass guarantee. We then apply the WGM as a domain-discovery precursor for existing known-domain algorithms for private top-kk and kk-hitting set and obtain new utility guarantees for their unknown domain variants. Finally, experiments demonstrate that all of our WGM-based methods are competitive with or outperform existing baselines for all three problems.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper contributes near-optimal missing mass guarantees for the Weighted Gaussian Mechanism (WGM) on Zipfian data and extends WGM to unknown-domain variants of top-k and k-hitting set problems. It sits in the 'Weighted Gaussian Mechanisms for Set Union' leaf under 'Core Domain Discovery and Set Union Mechanisms'. Notably, this leaf contains only the original paper itself—no sibling papers are present. This suggests the specific focus on WGM-based set union with missing mass analysis occupies a relatively sparse position within the broader domain discovery landscape.

The taxonomy reveals neighboring research directions that contextualize this work. The sibling leaf 'Adaptive Partition Selection' addresses scalable item selection from unbounded universes using adaptive weighting, which shares methodological overlap but excludes non-adaptive set union approaches. The adjacent branch 'Top-k Selection Over Unknown Domains' contains stability-based and composition-based methods for ranking frequent items, representing a complementary problem formulation. The 'Histogram Release and Continual Observation' branch tackles dynamic settings with streaming data, while 'Local Differential Privacy for Frequency Estimation' explores client-side randomization without trusted curators—both distinct from the centralized set union focus here.

Among the three contributions analyzed, the literature search examined four candidate papers total. The contribution 'WGM-based algorithms for unknown domain top-k and k-hitting set' was assessed against all four candidates, with zero refutable overlaps identified. The other two contributions—near-optimal missing mass guarantees on Zipfian data and reframing set union utility via missing mass—were not matched against any candidates in the search. This limited scope (four papers examined, not dozens) suggests the analysis captures closely related work but does not constitute an exhaustive survey of all potential prior art in domain discovery or weighted noise mechanisms.

Given the sparse taxonomy position and limited search scope, the work appears to occupy a relatively underexplored niche within differentially private domain discovery. The absence of sibling papers and zero refutable overlaps among examined candidates suggest the specific combination of WGM, missing mass analysis, and unknown-domain extensions may be novel. However, the small candidate pool (four papers) means broader connections to related mechanism design or domain discovery literature may exist beyond this analysis.

Taxonomy

Core-task Taxonomy Papers
8
3
Claimed Contributions
4
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: Differentially private domain discovery with unknown item domains. This field addresses the challenge of privately identifying and releasing information about data domains when the set of possible items is not known in advance. The taxonomy reveals several interconnected research directions. Core Domain Discovery and Set Union Mechanisms focus on fundamental techniques for aggregating sets while preserving privacy, often employing weighted noise injection strategies. Top-k Selection Over Unknown Domains tackles the problem of identifying the most frequent items without prior knowledge of the domain, with works like Practical Private Top-k[3] and Private Top-k Stability[2] exploring stability-based approaches. Histogram Release and Continual Observation addresses dynamic settings where data arrives over time, as seen in Streaming Private Histograms[5]. Local Differential Privacy for Frequency Estimation considers the more stringent local model where users perturb their own data before sharing. Finally, Bounded Noise Mechanisms for Valid Outputs, exemplified by Bounded Gaussian Mechanism[8], ensure that noisy outputs remain within valid ranges. Several active themes emerge across these branches. A central tension involves balancing utility and privacy when the domain size is unknown or potentially unbounded, requiring adaptive strategies that allocate privacy budget efficiently. Works in top-k selection emphasize stability and practical performance, while histogram release methods must handle continual updates without exhausting privacy budgets. Private Domain Discovery[0] sits within the Core Domain Discovery branch, specifically focusing on weighted Gaussian mechanisms for set union operations. Its emphasis on handling unknown domains through carefully calibrated noise aligns closely with the foundational concerns of Unknown Domain Privacy Framework[4], while its mechanism design shares methodological connections with Bounded Gaussian Mechanism[8]. Compared to top-k selection works like Practical Private Top-k[3], Private Domain Discovery[0] addresses the broader problem of discovering entire domains rather than selecting a fixed number of top items, representing a complementary approach to unknown-domain privacy challenges.

Claimed Contributions

Near-optimal missing mass guarantees for WGM on Zipfian data

The authors prove that the Weighted Gaussian Mechanism achieves near-optimal l1 missing mass bounds for datasets exhibiting Zipfian (power-law) frequency distributions, and also establish distribution-free l∞ missing mass guarantees. These are the first absolute utility guarantees for differentially private set union.

0 retrieved papers
WGM-based algorithms for unknown domain top-k and k-hitting set

The authors develop a meta-algorithm that uses WGM for domain discovery followed by known-domain algorithms for top-k selection and k-hitting set problems. They prove utility guarantees for these unknown domain variants by leveraging their l∞ missing mass bounds.

4 retrieved papers
Reframing set union utility in terms of missing mass

The authors introduce a new perspective for evaluating differentially private set union by measuring the fraction of total item mass recovered rather than counting unique items. This reframing enables them to prove meaningful utility guarantees where previous cardinality-based approaches could not.

0 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Near-optimal missing mass guarantees for WGM on Zipfian data

The authors prove that the Weighted Gaussian Mechanism achieves near-optimal l1 missing mass bounds for datasets exhibiting Zipfian (power-law) frequency distributions, and also establish distribution-free l∞ missing mass guarantees. These are the first absolute utility guarantees for differentially private set union.

Contribution

WGM-based algorithms for unknown domain top-k and k-hitting set

The authors develop a meta-algorithm that uses WGM for domain discovery followed by known-domain algorithms for top-k selection and k-hitting set problems. They prove utility guarantees for these unknown domain variants by leveraging their l∞ missing mass bounds.

Contribution

Reframing set union utility in terms of missing mass

The authors introduce a new perspective for evaluating differentially private set union by measuring the fraction of total item mass recovered rather than counting unique items. This reframing enables them to prove meaningful utility guarantees where previous cardinality-based approaches could not.

Differentially Private Domain Discovery | Novelty Validation