Natural Identifiers for Privacy and Data Audits in Large Language Models

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

privacy auditingnatural identifiersdataset inferencedifferential privacyLLMs

Assessing the privacy of large language models (LLMs) presents significant challenges. In particular, most existing methods for auditing differential privacy require the insertion of specially crafted canary data during training, making them impractical for auditing already-trained models without costly retraining. Additionally, dataset inference, which audits whether a suspect dataset was used to train a model, is infeasible without access to a private non-member held-out dataset. Yet, such held-out datasets are often unavailable or difficult to construct for real-world cases since they have to be from the same distribution (IID) as the suspect data. These limitations severely hinder the ability to conduct scalable, post-hoc audits. To enable such audits, this work introduces natural identifiers (NIDs) as a novel solution to the above-mentioned challenges. NIDs are structured random strings, such as cryptographic hashes and shortened URLs, naturally occurring in common LLM training datasets. Their format enables the generation of unlimited additional random strings from the same distribution, which can act as alternative canaries for audits and as same-distribution held-out data for dataset inference. Our evaluation highlights that indeed, using NIDs, we can facilitate post-hoc differential privacy auditing without any retraining and enable dataset inference for any suspect dataset containing NIDs without the need for a private non-member held-out dataset.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces natural identifiers (NIDs)—structured random strings like cryptographic hashes and shortened URLs—as a mechanism for post-hoc privacy and dataset inference auditing in LLMs. It sits within the 'Post-hoc Auditing Without Retraining' leaf of the taxonomy, which contains only two papers total. This is a relatively sparse research direction compared to more crowded areas like membership inference attacks or differential privacy defenses, suggesting the specific problem of auditing already-trained models without retraining remains underexplored despite its practical importance.

The taxonomy reveals that this work bridges two broader branches: 'Privacy Auditing and Measurement Frameworks' (its parent category) and 'Privacy Attack Methods and Mechanisms' (which includes membership inference and extraction techniques). Neighboring leaves include 'Auditing LLM Adaptations and Fine-Tuning' and 'Meta-Modeling and Statistical Auditing Approaches', which focus on different auditing contexts or methodologies. The scope note for the parent category explicitly emphasizes 'empirically measuring privacy leakage through systematic auditing', while excluding attack methods themselves—positioning this work as a measurement tool rather than a new attack vector.

Among 18 candidates examined across three contributions, the core NID concept (Contribution 1) shows one refutable candidate out of seven examined, while the adapted DP auditing framework (Contribution 2, three candidates) and dataset inference application (Contribution 3, eight candidates) show no clear refutations. The limited search scope means these statistics reflect top-K semantic matches rather than exhaustive coverage. The single refutable candidate for NIDs suggests some prior exploration of similar identifier-based approaches, though the specific adaptation to post-hoc auditing and dataset inference may retain novelty within this constrained search.

Based on the limited literature search of 18 candidates, the work appears to address a genuine gap in post-hoc auditing capabilities, particularly for dataset inference without held-out data. However, the sparse taxonomy leaf and single refutable candidate indicate this assessment is preliminary. A more comprehensive search across the broader auditing and attack literature would be needed to fully characterize the novelty of using naturally occurring structured strings for privacy measurement.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Post-hoc privacy and dataset inference auditing for large language models. The field has organized itself around several complementary branches that together address the lifecycle of privacy concerns in LLMs. Privacy Attack Methods and Mechanisms explores how adversaries can extract training data or infer membership, often through membership inference attacks or prompt-based extraction techniques. Privacy Auditing and Measurement Frameworks develops systematic approaches to quantify privacy risks without requiring model retraining, including works like Privacy Auditing[6] and LLM-PBE[2] that measure empirical privacy leakage. Privacy Defense and Mitigation Strategies focuses on protective mechanisms such as differential privacy during fine-tuning or data deduplication methods like Deduplicating Training Data[4]. Meanwhile, Privacy Risk Surveys and Comprehensive Reviews, including Privacy Risks Survey[1] and LLM Privacy Survey[12], synthesize knowledge across attack vectors and defenses. System-Level and Side-Channel Privacy Risks examines broader vulnerabilities beyond direct model queries, while Privacy in Specialized Domains and Applications addresses domain-specific concerns in healthcare, education, and other sensitive contexts. A particularly active line of work centers on post-hoc auditing techniques that assess privacy without retraining, balancing practical deployment constraints against rigorous measurement. Natural Identifiers Privacy[0] sits squarely within this auditing-focused cluster, examining how natural identifiers in training data can be exploited for privacy inference. This work shares methodological ground with Privacy Auditing[6], which similarly emphasizes measurement frameworks applicable to deployed models, and contrasts with attack-oriented studies like User Inference Attacks[5] that primarily demonstrate vulnerabilities rather than audit existing systems. The tension between developing efficient auditing methods and understanding the full spectrum of privacy risks remains central: some efforts prioritize scalable empirical measurement while others explore theoretical bounds or worst-case scenarios. Natural Identifiers Privacy[0] contributes to the former direction, offering practical auditing insights that complement broader surveys like LLM Privacy Review[10] and specialized attack studies.

Claimed Contributions

Natural identifiers (NIDs) for post-hoc privacy auditing

Can Refute

7 retrieved papers

The authors introduce natural identifiers (NIDs), which are structured random strings (such as cryptographic hashes and shortened URLs) that naturally occur in LLM training datasets. NIDs enable the generation of unlimited same-distribution samples, allowing post-hoc privacy audits without retraining models or requiring dedicated held-out datasets.

7 retrieved papers

Can Refute

Adapted one-run DP auditing framework using NIDs

3 retrieved papers

The authors modify the existing one-run differential privacy auditing method to work with NIDs, eliminating the need for retraining by treating NIDs as natural canaries and generating corresponding GIDs for ranking-based inference. This adaptation achieves tighter privacy bounds with reduced sample complexity.

3 retrieved papers

Practical dataset inference using NIDs

8 retrieved papers

The authors enable dataset inference for any suspect dataset containing NIDs by generating same-distribution held-out data from the NIDs themselves, removing the requirement for a private non-member held-out dataset. They also introduce a ranking-based test to improve efficiency.

8 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[6] Privacy auditing of large language models PDF

Panda, Ashwinee, Tang Xin-yu, Nasr, Milad, Choquette-Choo, Christopher A., Mittal, Prateek (2025)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Natural identifiers (NIDs) for post-hoc privacy auditing

[51] Privacy Auditing for Large Language Models with Natural Identifiers PDF

Can Refute

[52] Truthful Text Sanitization Guided by Inference Attacks PDF

Cannot Refute

[53] Privacy-Preserving Prompt Injection Detection for Smart Cloud-Deployed Large Language Models PDF

Cannot Refute

[54] Adopting LLMs in Internet of Cloud Ecosystems: Identifying the Key Privacy Challenges PDF

Cannot Refute

[55] Large Language Model Empowered Privacy-Protected Framework for PHI Annotation in Clinical Notes. PDF

Cannot Refute

[56] AuditableLLM: A Hash-Chain-Backed, Compliance-Aware Auditable Framework for Large Language Models PDF

Cannot Refute

[57] Auditing and Mitigating Safety Risks in Large Language Models PDF

Cannot Refute

Contribution

Adapted one-run DP auditing framework using NIDs

[58] How Well Can Differential Privacy Be Audited in One Run? PDF

Cannot Refute

[59] PANORAMIA: Privacy Auditing of Machine Learning Models without Retraining PDF

Cannot Refute

[60] UniAud: A Unified Auditing Framework for High Auditing Power and Utility with One Training Run PDF

Cannot Refute

Contribution

Practical dataset inference using NIDs

[61] Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agent PDF

Cannot Refute

[62] A high ranking-based ensemble network for student's performance prediction using improved meta-heuristic-aided feature selection and adaptive GAN for â¦ PDF

Cannot Refute

[63] Gibberish is All You Need for Membership Inference Detection in Contrastive Language-Audio Pretraining PDF

Cannot Refute

[64] Causal inference for recommendation PDF

Cannot Refute

[65] A Training Data Set Cleaning Method by Classification Ability Ranking for the $k$ -Nearest Neighbor Classifier PDF

Cannot Refute

[66] Ranking vs. classifying: Measuring knowledge base completion quality PDF

Cannot Refute

[67] Data Provenance Auditing of Fine-Tuned Large Language Models with a Text-Preserving Technique PDF

Cannot Refute

[68] Learning from Pairwise Comparisons Under Preference Reversals PDF

Cannot Refute

Natural Identifiers for Privacy and Data Audits in Large Language Models

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[6] Privacy auditing of large language models PDF

Contribution Analysis

Natural identifiers (NIDs) for post-hoc privacy auditing

[51] Privacy Auditing for Large Language Models with Natural Identifiers PDF

[52] Truthful Text Sanitization Guided by Inference Attacks PDF

[53] Privacy-Preserving Prompt Injection Detection for Smart Cloud-Deployed Large Language Models PDF

[54] Adopting LLMs in Internet of Cloud Ecosystems: Identifying the Key Privacy Challenges PDF

[55] Large Language Model Empowered Privacy-Protected Framework for PHI Annotation in Clinical Notes. PDF

[56] AuditableLLM: A Hash-Chain-Backed, Compliance-Aware Auditable Framework for Large Language Models PDF

[57] Auditing and Mitigating Safety Risks in Large Language Models PDF

Adapted one-run DP auditing framework using NIDs

[58] How Well Can Differential Privacy Be Audited in One Run? PDF

[59] PANORAMIA: Privacy Auditing of Machine Learning Models without Retraining PDF

[60] UniAud: A Unified Auditing Framework for High Auditing Power and Utility with One Training Run PDF

Practical dataset inference using NIDs

[61] Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agent PDF

[62] A high ranking-based ensemble network for student's performance prediction using improved meta-heuristic-aided feature selection and adaptive GAN for â¦ PDF

[63] Gibberish is All You Need for Membership Inference Detection in Contrastive Language-Audio Pretraining PDF

[64] Causal inference for recommendation PDF

[65] A Training Data Set Cleaning Method by Classification Ability Ranking for the kkk -Nearest Neighbor Classifier PDF

[66] Ranking vs. classifying: Measuring knowledge base completion quality PDF

[67] Data Provenance Auditing of Fine-Tuned Large Language Models with a Text-Preserving Technique PDF

[68] Learning from Pairwise Comparisons Under Preference Reversals PDF

Table of Contents

[62] A high ranking-based ensemble network for student's performance prediction using improved meta-heuristic-aided feature selection and adaptive GAN for â¦ PDF

[65] A Training Data Set Cleaning Method by Classification Ability Ranking for the $k$ -Nearest Neighbor Classifier PDF