Natural Identifiers for Privacy and Data Audits in Large Language Models

ICLR 2026 Conference SubmissionAnonymous Authors
privacy auditingnatural identifiersdataset inferencedifferential privacyLLMs
Abstract:

Assessing the privacy of large language models (LLMs) presents significant challenges. In particular, most existing methods for auditing differential privacy require the insertion of specially crafted canary data during training, making them impractical for auditing already-trained models without costly retraining. Additionally, dataset inference, which audits whether a suspect dataset was used to train a model, is infeasible without access to a private non-member held-out dataset. Yet, such held-out datasets are often unavailable or difficult to construct for real-world cases since they have to be from the same distribution (IID) as the suspect data. These limitations severely hinder the ability to conduct scalable, post-hoc audits. To enable such audits, this work introduces natural identifiers (NIDs) as a novel solution to the above-mentioned challenges. NIDs are structured random strings, such as cryptographic hashes and shortened URLs, naturally occurring in common LLM training datasets. Their format enables the generation of unlimited additional random strings from the same distribution, which can act as alternative canaries for audits and as same-distribution held-out data for dataset inference. Our evaluation highlights that indeed, using NIDs, we can facilitate post-hoc differential privacy auditing without any retraining and enable dataset inference for any suspect dataset containing NIDs without the need for a private non-member held-out dataset.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces natural identifiers (NIDs)—structured random strings like cryptographic hashes and shortened URLs—as a mechanism for post-hoc privacy and dataset inference auditing in LLMs. It sits within the 'Post-hoc Auditing Without Retraining' leaf of the taxonomy, which contains only two papers total. This is a relatively sparse research direction compared to more crowded areas like membership inference attacks or differential privacy defenses, suggesting the specific problem of auditing already-trained models without retraining remains underexplored despite its practical importance.

The taxonomy reveals that this work bridges two broader branches: 'Privacy Auditing and Measurement Frameworks' (its parent category) and 'Privacy Attack Methods and Mechanisms' (which includes membership inference and extraction techniques). Neighboring leaves include 'Auditing LLM Adaptations and Fine-Tuning' and 'Meta-Modeling and Statistical Auditing Approaches', which focus on different auditing contexts or methodologies. The scope note for the parent category explicitly emphasizes 'empirically measuring privacy leakage through systematic auditing', while excluding attack methods themselves—positioning this work as a measurement tool rather than a new attack vector.

Among 18 candidates examined across three contributions, the core NID concept (Contribution 1) shows one refutable candidate out of seven examined, while the adapted DP auditing framework (Contribution 2, three candidates) and dataset inference application (Contribution 3, eight candidates) show no clear refutations. The limited search scope means these statistics reflect top-K semantic matches rather than exhaustive coverage. The single refutable candidate for NIDs suggests some prior exploration of similar identifier-based approaches, though the specific adaptation to post-hoc auditing and dataset inference may retain novelty within this constrained search.

Based on the limited literature search of 18 candidates, the work appears to address a genuine gap in post-hoc auditing capabilities, particularly for dataset inference without held-out data. However, the sparse taxonomy leaf and single refutable candidate indicate this assessment is preliminary. A more comprehensive search across the broader auditing and attack literature would be needed to fully characterize the novelty of using naturally occurring structured strings for privacy measurement.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
18
Contribution Candidate Papers Compared
1
Refutable Paper

Research Landscape Overview

Core task: Post-hoc privacy and dataset inference auditing for large language models. The field has organized itself around several complementary branches that together address the lifecycle of privacy concerns in LLMs. Privacy Attack Methods and Mechanisms explores how adversaries can extract training data or infer membership, often through membership inference attacks or prompt-based extraction techniques. Privacy Auditing and Measurement Frameworks develops systematic approaches to quantify privacy risks without requiring model retraining, including works like Privacy Auditing[6] and LLM-PBE[2] that measure empirical privacy leakage. Privacy Defense and Mitigation Strategies focuses on protective mechanisms such as differential privacy during fine-tuning or data deduplication methods like Deduplicating Training Data[4]. Meanwhile, Privacy Risk Surveys and Comprehensive Reviews, including Privacy Risks Survey[1] and LLM Privacy Survey[12], synthesize knowledge across attack vectors and defenses. System-Level and Side-Channel Privacy Risks examines broader vulnerabilities beyond direct model queries, while Privacy in Specialized Domains and Applications addresses domain-specific concerns in healthcare, education, and other sensitive contexts. A particularly active line of work centers on post-hoc auditing techniques that assess privacy without retraining, balancing practical deployment constraints against rigorous measurement. Natural Identifiers Privacy[0] sits squarely within this auditing-focused cluster, examining how natural identifiers in training data can be exploited for privacy inference. This work shares methodological ground with Privacy Auditing[6], which similarly emphasizes measurement frameworks applicable to deployed models, and contrasts with attack-oriented studies like User Inference Attacks[5] that primarily demonstrate vulnerabilities rather than audit existing systems. The tension between developing efficient auditing methods and understanding the full spectrum of privacy risks remains central: some efforts prioritize scalable empirical measurement while others explore theoretical bounds or worst-case scenarios. Natural Identifiers Privacy[0] contributes to the former direction, offering practical auditing insights that complement broader surveys like LLM Privacy Review[10] and specialized attack studies.

Claimed Contributions

Natural identifiers (NIDs) for post-hoc privacy auditing

The authors introduce natural identifiers (NIDs), which are structured random strings (such as cryptographic hashes and shortened URLs) that naturally occur in LLM training datasets. NIDs enable the generation of unlimited same-distribution samples, allowing post-hoc privacy audits without retraining models or requiring dedicated held-out datasets.

7 retrieved papers
Can Refute
Adapted one-run DP auditing framework using NIDs

The authors modify the existing one-run differential privacy auditing method to work with NIDs, eliminating the need for retraining by treating NIDs as natural canaries and generating corresponding GIDs for ranking-based inference. This adaptation achieves tighter privacy bounds with reduced sample complexity.

3 retrieved papers
Practical dataset inference using NIDs

The authors enable dataset inference for any suspect dataset containing NIDs by generating same-distribution held-out data from the NIDs themselves, removing the requirement for a private non-member held-out dataset. They also introduce a ranking-based test to improve efficiency.

8 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Natural identifiers (NIDs) for post-hoc privacy auditing

The authors introduce natural identifiers (NIDs), which are structured random strings (such as cryptographic hashes and shortened URLs) that naturally occur in LLM training datasets. NIDs enable the generation of unlimited same-distribution samples, allowing post-hoc privacy audits without retraining models or requiring dedicated held-out datasets.

Contribution

Adapted one-run DP auditing framework using NIDs

The authors modify the existing one-run differential privacy auditing method to work with NIDs, eliminating the need for retraining by treating NIDs as natural canaries and generating corresponding GIDs for ranking-based inference. This adaptation achieves tighter privacy bounds with reduced sample complexity.

Contribution

Practical dataset inference using NIDs

The authors enable dataset inference for any suspect dataset containing NIDs by generating same-distribution held-out data from the NIDs themselves, removing the requirement for a private non-member held-out dataset. They also introduce a ranking-based test to improve efficiency.