Fine-Grained Privacy Extraction from Retrieval-Augmented Generation Systems by Exploiting Knowledge Asymmetry

ICLR 2026 Conference SubmissionAnonymous Authors
RAGknowledge asymmetryprivacy extractioncross-domain generalization
Abstract:

Retrieval-Augmented Generation (RAG) systems enhance large language models (LLMs) by incorporating external knowledge bases, significantly improving their factual accuracy and contextual relevance. However, this integration also introduces new privacy vulnerabilities. Existing privacy attacks on RAG systems may trigger data leakage, but they often fail to accurately isolate knowledge base-derived content within mixed responses and perform poorly in multi-domain settings. In this paper, we propose a novel black-box attack framework that exploits knowledge asymmetry between RAG systems and standard LLMs to enable fine-grained privacy extraction across heterogeneous knowledge domains. Our approach decomposes adversarial queries to maximize information divergence between the models, then applies semantic relationship scoring to resolve lexical and syntactic ambiguities. These features are used to train a neural classifier capable of precisely identifying response segments that contain private or sensitive information. Unlike prior methods, our framework generalizes to unseen domains through iterative refinement without requiring prior knowledge of the corpus. Experimental results show that our method achieves over 90% extraction accuracy in single-domain scenarios and 80% in multi-domain settings, outperforming baselines by over 30% in key evaluation metrics. These results represent the first systematic solution for fine-grained privacy localization in RAG systems, exposing critical security vulnerabilities and paving the way for stronger, more resilient defenses.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes a black-box attack framework that exploits knowledge asymmetry between RAG systems and standard LLMs to extract private information from knowledge bases. It resides in the 'Direct Data Extraction Attacks' leaf under 'Privacy Attack Methods and Vulnerability Analysis', which contains six papers total. This leaf focuses on adversarial techniques that directly extract or reconstruct private data through query manipulation. The research direction is moderately populated, indicating active interest in understanding how RAG retrieval mechanisms can be exploited to leak sensitive content from external knowledge sources.

The taxonomy reveals closely related attack categories in sibling leaves: 'Inference and Leakage Detection Attacks' (four papers) addresses membership inference and indirect leakage signals, while 'Multimodal and Domain-Specific Privacy Vulnerabilities' (two papers) extends privacy analysis beyond text. The paper's focus on knowledge asymmetry and fine-grained extraction connects it to broader themes in 'Privacy Risk Assessment and Trustworthiness Analysis' (nine papers), which examines systemic vulnerabilities. Its black-box approach contrasts with defense-oriented branches like 'Differential Privacy and Formal Privacy Guarantees' (five papers) and 'Encryption and Secure Computation Techniques' (six papers), highlighting the attack-defense dynamic central to this field.

Among sixteen candidates examined across three contributions, none were found to clearly refute the proposed methods. The core black-box framework examined ten candidates with zero refutations, suggesting limited prior work on knowledge asymmetry exploitation for fine-grained extraction. The adversarial query decomposition component examined two candidates, and the neural classifier using NLI-enhanced features examined four candidates, both without refutation. This limited search scope indicates that within the top-sixteen semantically similar papers, no direct overlaps were detected, though the small candidate pool means the analysis cannot confirm broader field-level novelty.

Based on the restricted literature search, the work appears to occupy a distinct position within direct extraction attacks, particularly in its emphasis on multi-domain generalization and semantic relationship scoring. The absence of refutations among sixteen candidates suggests the specific combination of techniques may be novel, though the limited scope prevents definitive conclusions about incremental versus transformative contributions. The taxonomy context shows this is an active research area with established attack and defense paradigms, positioning the work as a refinement within ongoing adversarial exploration of RAG privacy vulnerabilities.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
16
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: privacy extraction from retrieval-augmented generation systems. The field addresses how RAG architectures—which combine large language models with external knowledge retrieval—introduce novel privacy vulnerabilities and how to mitigate them. The taxonomy organizes research into several main branches: Privacy Attack Methods and Vulnerability Analysis explores direct data extraction, backdoor attacks, and membership inference techniques that exploit RAG's retrieval mechanisms; Privacy Protection and Defense Mechanisms develops countermeasures such as differential privacy adaptations, perturbation strategies, and privacy-aware decoding; Federated and Distributed Privacy-Preserving RAG Architectures investigates decentralized designs like C-FedRAG[10] and HyFedRAG[16] that distribute knowledge bases while preserving confidentiality; Privacy Risk Assessment and Trustworthiness Analysis examines broader trustworthiness concerns as surveyed in Trustworthy RAG Survey[5] and Trustworthiness RAG Survey[1]; and Privacy-Aware RAG System Design and Deployment focuses on practical implementations including on-device solutions and regulatory compliance frameworks. Additional branches cover evaluation datasets, domain-specific applications with privacy considerations, and general RAG background. A particularly active line of work centers on direct extraction attacks that leverage the asymmetry between what RAG systems retrieve and what users should access. Knowledge Asymmetry Privacy[0] investigates how adversaries can exploit this gap to infer sensitive information from retrieved contexts, positioning itself alongside RAG-Thief[6] and Knowledge Asymmetry Exploitation[21], which similarly probe how retrieval patterns leak private data. These attack-focused studies contrast sharply with defense-oriented approaches like Differential Privacy RAG[3] and Local Entity Perturbation[8], which add noise or modify retrieval to limit exposure. Meanwhile, works such as DEAL[14] and Silent Leaks[35] reveal subtler vulnerabilities in how RAG systems inadvertently disclose information through seemingly benign outputs. The interplay between these attack and defense branches highlights ongoing tensions: stronger extraction methods push the boundaries of what RAG systems can inadvertently reveal, while privacy-preserving architectures and mitigation strategies seek to close these gaps without sacrificing retrieval utility.

Claimed Contributions

Black-box attack framework exploiting knowledge asymmetry for fine-grained privacy extraction

The authors introduce a framework that leverages the systematic differences in responses between RAG systems (which access external knowledge bases) and standard LLMs (which rely only on pre-trained knowledge) to precisely identify which specific sentences in RAG outputs contain private information from the knowledge base, rather than just detecting that leakage occurred.

10 retrieved papers
Adversarial query decomposition with iterative refinement for multi-domain scenarios

The method splits queries into two components (q1 and q2) to maximize response differences between RAG and standard LLMs, and introduces an iterative refinement strategy that uses broad initial queries to detect privacy leakage, then refines subsequent queries based on extracted privacy fragments to work in zero-prior-knowledge multi-domain settings.

2 retrieved papers
Neural classifier using NLI-enhanced similarity features for privacy sentence identification

The authors develop a classification approach that combines sentence embeddings with natural language inference models to compute refined similarity scores between RAG and LLM responses, then trains a neural network on these features to accurately distinguish sentences containing knowledge-base content from those generated by the LLM alone.

4 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Black-box attack framework exploiting knowledge asymmetry for fine-grained privacy extraction

The authors introduce a framework that leverages the systematic differences in responses between RAG systems (which access external knowledge bases) and standard LLMs (which rely only on pre-trained knowledge) to precisely identify which specific sentences in RAG outputs contain private information from the knowledge base, rather than just detecting that leakage occurred.

Contribution

Adversarial query decomposition with iterative refinement for multi-domain scenarios

The method splits queries into two components (q1 and q2) to maximize response differences between RAG and standard LLMs, and introduces an iterative refinement strategy that uses broad initial queries to detect privacy leakage, then refines subsequent queries based on extracted privacy fragments to work in zero-prior-knowledge multi-domain settings.

Contribution

Neural classifier using NLI-enhanced similarity features for privacy sentence identification

The authors develop a classification approach that combines sentence embeddings with natural language inference models to compute refined similarity scores between RAG and LLM responses, then trains a neural network on these features to accurately distinguish sentences containing knowledge-base content from those generated by the LLM alone.

Fine-Grained Privacy Extraction from Retrieval-Augmented Generation Systems by Exploiting Knowledge Asymmetry | Novelty Validation