On Differential Private 1\ell_1, 2\ell_2 and pp\ell_p^p Distance Queries

ICLR 2026 Conference SubmissionAnonymous Authors
Differential PrivacyKernel Density EstimationDistance QueryData StructureBalanced Binary Tree
Abstract:

We introduce a refined differentially private (DP) data structure for kernel density estimation (KDE) with 1\ell_1, 2\ell_2 and pp\ell_p^p kernels. This new DP data structure offers not only improved privacy-utility tradeoff but also better query efficiency over prior results. Specifically, we study the mathematical problem: given a similarity function ff (or DP KDE) and a private dataset XRdX \subset \mathbb{R}^d, our goal is to preprocess XX so that for any query yRdy\in\mathbb{R}^d, we approximate xXf(x,y)\sum_{x \in X} f(x, y) in a differentially private fashion. The best previous algorithm for f(x,y)=xy1f(x,y) =\| x - y \|_1 is the node-contaminated balanced binary tree by [Backurs, Lin, Mahabadi, Silwal, and Tarnawski, ICLR 2024]. Their algorithm requires O(nd)O(nd) space and time for preprocessing with n=Xn=|X|. For any query point, the query time is α1dlog2n\alpha^{-1}d \log^2 n, with an error guarantee of (1+α)(1+\alpha)-approximation and ϵ1α0.5d1.5Rlog1.5n\epsilon^{-1} \alpha^{-0.5} d^{1.5} R \log^{1.5} n.

In this paper, we use the same space and pre-processing time, improve the best previous result [Backurs, Lin, Mahabadi, Silwal, and Tarnawski, ICLR 2024] in three aspects

  • We reduce query time by α1logn\alpha^{-1} \log n factor
  • We improve the approximation ratio from α\alpha to 11
  • We reduce the error dependence by a factor of α0.5\alpha^{-0.5}

From a technical perspective, our method of constructing the search tree differs from previous work [Backurs, Lin, Mahabadi, Silwal, and Tarnawski, ICLR 2024]. In prior work, for each query, the answer is split into α1logn\alpha^{-1} \log n numbers, each derived from the summation of logn\log n values in interval tree countings. In contrast, we construct the tree differently, splitting the answer into logn\log n numbers, where each is a smart combination of two distance values, two counting values, and yy itself. We believe our tree structure may be of independent interest.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper contributes a refined differentially private data structure for kernel density estimation with ℓ₁, ℓ₂, and ℓₚᵖ kernels, improving query time, error bounds, and privacy-utility tradeoffs over prior work. It resides in the Tree-Based Private Distance Query Structures leaf, which contains only two papers including the original submission. This leaf sits within the Core Algorithmic Mechanisms branch, indicating the work addresses foundational algorithmic techniques rather than application-specific implementations. The sparse population of this leaf suggests tree-based approaches for private distance queries remain an emerging research direction with limited prior exploration.

The taxonomy reveals neighboring work in General Private Similarity Computation Frameworks, which abstracts kernel-based similarity queries without specializing to tree structures, and Private Density Estimation with Distributional Guarantees, which emphasizes statistical convergence metrics like Wasserstein distance rather than query efficiency. The paper's focus on preprocessing datasets for repeated interactive queries distinguishes it from one-shot density release methods in sibling branches. Geometry-Aware and Manifold-Based Private Mechanisms explore non-Euclidean settings, while Application-Specific branches target domains like location-based services, both orthogonal to the core algorithmic emphasis here.

Among the three contributions analyzed, the first two—refined ℓ₁ kernel data structure and novel tree construction—each examined ten candidates and found one refutable prior work, indicating some overlap with existing tree-based methods within the limited search scope of twenty-four papers. The third contribution, generalizing to ℓ₂ and ℓₚᵖ kernels via dimensional reduction, examined four candidates with no refutations, suggesting this extension may be less directly addressed in the examined literature. The analysis explicitly covers top-K semantic matches and citation expansion, not an exhaustive survey, so unexamined work may exist beyond this scope.

Given the limited search scale and the sparse taxonomy leaf, the work appears to advance an active but not yet crowded research direction. The presence of one sibling paper and modest refutation counts suggest incremental refinement of tree-based mechanisms rather than a completely unexplored problem space. The generalization to multiple kernel types and the dimensional reduction technique may offer more distinctive contributions, though the analysis cannot confirm novelty beyond the examined candidate set.

Taxonomy

Core-task Taxonomy Papers
8
3
Claimed Contributions
24
Contribution Candidate Papers Compared
2
Refutable Paper

Research Landscape Overview

Core task: differentially private kernel density estimation with distance queries. The field addresses how to release density estimates or answer similarity queries under formal privacy guarantees, balancing utility with protection of individual records. The taxonomy organizes work into four main branches. Core Algorithmic Mechanisms for Private Similarity Queries develops foundational techniques—often tree-based or kernel-based structures—that enable efficient private distance computations and nearest-neighbor lookups. Private Density Estimation with Distributional Guarantees focuses on methods that provide strong statistical convergence or distributional accuracy, ensuring that released densities approximate the true distribution under rigorous metrics. Geometry-Aware and Manifold-Based Private Mechanisms exploits low-dimensional or manifold structure to improve utility when data lie on constrained geometries, as seen in works like Conformal Manifold Privacy[7]. Application-Specific Private Data Sharing and Synthesis targets concrete use cases—such as location-based services or synthetic data generation—where domain constraints shape the design of private mechanisms, exemplified by POI Privacy Recommendation[4] and Spatiotemporal Privacy Sharing[6]. Several active lines of work explore trade-offs between computational efficiency, statistical accuracy, and privacy budget allocation. One prominent theme contrasts tree-based indexing strategies, which enable fast query responses, with kernel smoothing approaches that offer smoother density approximations but may require more sophisticated noise calibration. Another thread examines how metric structure—whether Euclidean, Wasserstein, or manifold-based—affects both privacy analysis and utility, as illustrated by Private Wasserstein Density[3] and Metric Local Privacy[5]. Private Distance Queries[0] sits within the tree-based branch alongside Private Kernel Density[1], emphasizing efficient data structures for answering distance queries under differential privacy. Compared to Private Kernel Density[1], which directly estimates densities via kernel methods, Private Distance Queries[0] focuses on organizing data to support repeated similarity lookups, a complementary emphasis that addresses scenarios where interactive querying is central rather than one-shot density release.

Claimed Contributions

Refined DP data structure for l1 kernel density estimation

The authors propose a new differentially private data structure for kernel density estimation that improves upon prior work by reducing query time by a factor of α^(-1) log n, achieving exact approximation (ratio 1 instead of 1+α), and reducing error dependence by a factor of α^(-0.5), while maintaining the same space and preprocessing time complexity.

10 retrieved papers
Can Refute
Novel tree construction method with smart node representation

The authors introduce a new method for constructing the search tree where each node stores the sum of distances from one point to multiple points, allowing query answers to be split into only log n values (instead of α^(-1) log n) through smart combinations of distance and count values, which they believe has independent interest beyond this specific application.

10 retrieved papers
Can Refute
Generalization to l2 and lp^p kernels via dimensional reduction

The authors extend their l1 kernel results to l2 and lp^p distance kernels using a provably dimensional reduction recipe, achieving improvements over the best previous results for these kernels as well.

4 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Refined DP data structure for l1 kernel density estimation

The authors propose a new differentially private data structure for kernel density estimation that improves upon prior work by reducing query time by a factor of α^(-1) log n, achieving exact approximation (ratio 1 instead of 1+α), and reducing error dependence by a factor of α^(-0.5), while maintaining the same space and preprocessing time complexity.

Contribution

Novel tree construction method with smart node representation

The authors introduce a new method for constructing the search tree where each node stores the sum of distances from one point to multiple points, allowing query answers to be split into only log n values (instead of α^(-1) log n) through smart combinations of distance and count values, which they believe has independent interest beyond this specific application.

Contribution

Generalization to l2 and lp^p kernels via dimensional reduction

The authors extend their l1 kernel results to l2 and lp^p distance kernels using a provably dimensional reduction recipe, achieving improvements over the best previous results for these kernels as well.