Fine-Grained Privacy Extraction from Retrieval-Augmented Generation Systems by Exploiting Knowledge Asymmetry

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

RAGknowledge asymmetryprivacy extractioncross-domain generalization

Retrieval-Augmented Generation (RAG) systems enhance large language models (LLMs) by incorporating external knowledge bases, significantly improving their factual accuracy and contextual relevance. However, this integration also introduces new privacy vulnerabilities. Existing privacy attacks on RAG systems may trigger data leakage, but they often fail to accurately isolate knowledge base-derived content within mixed responses and perform poorly in multi-domain settings. In this paper, we propose a novel black-box attack framework that exploits knowledge asymmetry between RAG systems and standard LLMs to enable fine-grained privacy extraction across heterogeneous knowledge domains. Our approach decomposes adversarial queries to maximize information divergence between the models, then applies semantic relationship scoring to resolve lexical and syntactic ambiguities. These features are used to train a neural classifier capable of precisely identifying response segments that contain private or sensitive information. Unlike prior methods, our framework generalizes to unseen domains through iterative refinement without requiring prior knowledge of the corpus. Experimental results show that our method achieves over 90% extraction accuracy in single-domain scenarios and 80% in multi-domain settings, outperforming baselines by over 30% in key evaluation metrics. These results represent the first systematic solution for fine-grained privacy localization in RAG systems, exposing critical security vulnerabilities and paving the way for stronger, more resilient defenses.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes a black-box attack framework that exploits knowledge asymmetry between RAG systems and standard LLMs to extract private information from knowledge bases. It resides in the 'Direct Data Extraction Attacks' leaf under 'Privacy Attack Methods and Vulnerability Analysis', which contains six papers total. This leaf focuses on adversarial techniques that directly extract or reconstruct private data through query manipulation. The research direction is moderately populated, indicating active interest in understanding how RAG retrieval mechanisms can be exploited to leak sensitive content from external knowledge sources.

The taxonomy reveals closely related attack categories in sibling leaves: 'Inference and Leakage Detection Attacks' (four papers) addresses membership inference and indirect leakage signals, while 'Multimodal and Domain-Specific Privacy Vulnerabilities' (two papers) extends privacy analysis beyond text. The paper's focus on knowledge asymmetry and fine-grained extraction connects it to broader themes in 'Privacy Risk Assessment and Trustworthiness Analysis' (nine papers), which examines systemic vulnerabilities. Its black-box approach contrasts with defense-oriented branches like 'Differential Privacy and Formal Privacy Guarantees' (five papers) and 'Encryption and Secure Computation Techniques' (six papers), highlighting the attack-defense dynamic central to this field.

Among sixteen candidates examined across three contributions, none were found to clearly refute the proposed methods. The core black-box framework examined ten candidates with zero refutations, suggesting limited prior work on knowledge asymmetry exploitation for fine-grained extraction. The adversarial query decomposition component examined two candidates, and the neural classifier using NLI-enhanced features examined four candidates, both without refutation. This limited search scope indicates that within the top-sixteen semantically similar papers, no direct overlaps were detected, though the small candidate pool means the analysis cannot confirm broader field-level novelty.

Based on the restricted literature search, the work appears to occupy a distinct position within direct extraction attacks, particularly in its emphasis on multi-domain generalization and semantic relationship scoring. The absence of refutations among sixteen candidates suggests the specific combination of techniques may be novel, though the limited scope prevents definitive conclusions about incremental versus transformative contributions. The taxonomy context shows this is an active research area with established attack and defense paradigms, positioning the work as a refinement within ongoing adversarial exploration of RAG privacy vulnerabilities.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: privacy extraction from retrieval-augmented generation systems. The field addresses how RAG architectures—which combine large language models with external knowledge retrieval—introduce novel privacy vulnerabilities and how to mitigate them. The taxonomy organizes research into several main branches: Privacy Attack Methods and Vulnerability Analysis explores direct data extraction, backdoor attacks, and membership inference techniques that exploit RAG's retrieval mechanisms; Privacy Protection and Defense Mechanisms develops countermeasures such as differential privacy adaptations, perturbation strategies, and privacy-aware decoding; Federated and Distributed Privacy-Preserving RAG Architectures investigates decentralized designs like C-FedRAG[10] and HyFedRAG[16] that distribute knowledge bases while preserving confidentiality; Privacy Risk Assessment and Trustworthiness Analysis examines broader trustworthiness concerns as surveyed in Trustworthy RAG Survey[5] and Trustworthiness RAG Survey[1]; and Privacy-Aware RAG System Design and Deployment focuses on practical implementations including on-device solutions and regulatory compliance frameworks. Additional branches cover evaluation datasets, domain-specific applications with privacy considerations, and general RAG background. A particularly active line of work centers on direct extraction attacks that leverage the asymmetry between what RAG systems retrieve and what users should access. Knowledge Asymmetry Privacy[0] investigates how adversaries can exploit this gap to infer sensitive information from retrieved contexts, positioning itself alongside RAG-Thief[6] and Knowledge Asymmetry Exploitation[21], which similarly probe how retrieval patterns leak private data. These attack-focused studies contrast sharply with defense-oriented approaches like Differential Privacy RAG[3] and Local Entity Perturbation[8], which add noise or modify retrieval to limit exposure. Meanwhile, works such as DEAL[14] and Silent Leaks[35] reveal subtler vulnerabilities in how RAG systems inadvertently disclose information through seemingly benign outputs. The interplay between these attack and defense branches highlights ongoing tensions: stronger extraction methods push the boundaries of what RAG systems can inadvertently reveal, while privacy-preserving architectures and mitigation strategies seek to close these gaps without sacrificing retrieval utility.

Claimed Contributions

Black-box attack framework exploiting knowledge asymmetry for fine-grained privacy extraction

10 retrieved papers

The authors introduce a framework that leverages the systematic differences in responses between RAG systems (which access external knowledge bases) and standard LLMs (which rely only on pre-trained knowledge) to precisely identify which specific sentences in RAG outputs contain private information from the knowledge base, rather than just detecting that leakage occurred.

10 retrieved papers

Adversarial query decomposition with iterative refinement for multi-domain scenarios

2 retrieved papers

The method splits queries into two components (q1 and q2) to maximize response differences between RAG and standard LLMs, and introduces an iterative refinement strategy that uses broad initial queries to detect privacy leakage, then refines subsequent queries based on extracted privacy fragments to work in zero-prior-knowledge multi-domain settings.

2 retrieved papers

Neural classifier using NLI-enhanced similarity features for privacy sentence identification

4 retrieved papers

The authors develop a classification approach that combines sentence embeddings with natural language inference models to compute refined similarity scores between RAG and LLM responses, then trains a neural network on these features to accurately distinguish sentences containing knowledge-base content from those generated by the LLM alone.

4 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[2] Data extraction attacks in retrieval-augmented generation via backdoors PDF

Peng Yue-feng, Wang, Junda, Yuefeng Peng, Yu Hong, Junda Wang, Houmansadr Amir, Hong Yu, Amir Houmansadr (2024)

[6] Rag-thief: Scalable extraction of private data from retrieval-augmented generation applications with agent-based attacks PDF

C. Jiang, Xudong Pan, Ge Hong, Bao Chai, Min Yang (2024)

[14] DEAL: High-Efficacy Privacy Attack on Retrieval-Augmented Generation Systems via LLM Optimizer PDF

T Zhang, Y Jiang, R Gong, P Zhou, W Yin, X Wei (2024)

[21] Fine-Grained Privacy Extraction from Retrieval-Augmented Generation Systems via Knowledge Asymmetry Exploitation PDF

Chen Yu-fei, Wang, Yao, Yufei Chen, Zhang Hai-bin, Yao Wang, Gu Tao, Haibin Zhang, Tao Gu (2025)

[35] Silent leaks: Implicit knowledge extraction attack on rag systems through benign queries PDF

Wang, Yuhao, Qu, Wenjie, Zhai, Shengfang, Liu Zichen, Liu Yu-e, Dong, Yinpeng, Zhang Jia-heng (2025)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Black-box attack framework exploiting knowledge asymmetry for fine-grained privacy extraction

[19] RAG-leaks: difficulty-calibrated membership inference attacks on retrieval-augmented generation PDF

Cannot Refute

[21] Fine-Grained Privacy Extraction from Retrieval-Augmented Generation Systems via Knowledge Asymmetry Exploitation PDF

Cannot Refute

[22] Securing Retrieval-Augmented Generation-Privacy Risks and Mitigation Strategies PDF

Cannot Refute

[35] Silent leaks: Implicit knowledge extraction attack on rag systems through benign queries PDF

Cannot Refute

[55] PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models PDF

Cannot Refute

[56] Feedback-Guided Extraction of Knowledge Base from Retrieval-Augmented LLM Applications PDF

Cannot Refute

[57] Quantifying Return on Controls in LLM Cybersecurity PDF

Cannot Refute

[58] MrM: Black-Box Membership Inference Attacks against Multimodal RAG Systems PDF

Cannot Refute

[59] Large Language Model-Based Cyber Threat Analysis and Mitigation Framework With Adversarial Attacks in IoT PDF

Cannot Refute

[60] Securing AI PDF

Cannot Refute

Contribution

Adversarial query decomposition with iterative refinement for multi-domain scenarios

[21] Fine-Grained Privacy Extraction from Retrieval-Augmented Generation Systems via Knowledge Asymmetry Exploitation PDF

Cannot Refute

[61] TexShape: Information Theoretic Sentence Embedding for Language Models PDF

Cannot Refute

Contribution

Neural classifier using NLI-enhanced similarity features for privacy sentence identification

[51] Enhancing Trustworthiness in Large Language Models: Perspectives on Privacy and Safety PDF

Cannot Refute

[52] Beyond Keywords: A Context-based Hybrid Approach to Mining Ethical Concern-related App Reviews PDF

Cannot Refute

[53] Privacy Protection in Conversations PDF

Cannot Refute

[54] Entity De-Identification in Information Extraction Tasks with Large Language Models PDF

Cannot Refute

Fine-Grained Privacy Extraction from Retrieval-Augmented Generation Systems by Exploiting Knowledge Asymmetry

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[2] Data extraction attacks in retrieval-augmented generation via backdoors PDF

[6] Rag-thief: Scalable extraction of private data from retrieval-augmented generation applications with agent-based attacks PDF

[14] DEAL: High-Efficacy Privacy Attack on Retrieval-Augmented Generation Systems via LLM Optimizer PDF

[21] Fine-Grained Privacy Extraction from Retrieval-Augmented Generation Systems via Knowledge Asymmetry Exploitation PDF

[35] Silent leaks: Implicit knowledge extraction attack on rag systems through benign queries PDF

Contribution Analysis

Black-box attack framework exploiting knowledge asymmetry for fine-grained privacy extraction

[19] RAG-leaks: difficulty-calibrated membership inference attacks on retrieval-augmented generation PDF

[21] Fine-Grained Privacy Extraction from Retrieval-Augmented Generation Systems via Knowledge Asymmetry Exploitation PDF

[22] Securing Retrieval-Augmented Generation-Privacy Risks and Mitigation Strategies PDF

[35] Silent leaks: Implicit knowledge extraction attack on rag systems through benign queries PDF

[55] PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models PDF

[56] Feedback-Guided Extraction of Knowledge Base from Retrieval-Augmented LLM Applications PDF

[57] Quantifying Return on Controls in LLM Cybersecurity PDF

[58] MrM: Black-Box Membership Inference Attacks against Multimodal RAG Systems PDF

[59] Large Language Model-Based Cyber Threat Analysis and Mitigation Framework With Adversarial Attacks in IoT PDF

[60] Securing AI PDF

Adversarial query decomposition with iterative refinement for multi-domain scenarios

[21] Fine-Grained Privacy Extraction from Retrieval-Augmented Generation Systems via Knowledge Asymmetry Exploitation PDF

[61] TexShape: Information Theoretic Sentence Embedding for Language Models PDF

Neural classifier using NLI-enhanced similarity features for privacy sentence identification

[51] Enhancing Trustworthiness in Large Language Models: Perspectives on Privacy and Safety PDF

[52] Beyond Keywords: A Context-based Hybrid Approach to Mining Ethical Concern-related App Reviews PDF

[53] Privacy Protection in Conversations PDF

[54] Entity De-Identification in Information Extraction Tasks with Large Language Models PDF

Table of Contents