LUMINA: Detecting Hallucinations in RAG System with Context–Knowledge Signals

ICLR 2026 Conference SubmissionAnonymous Authors
Hallucination detectionRetrieval-augmented generationReliability of LLM
Abstract:

Retrieval-Augmented Generation (RAG) aims to mitigate hallucinations in large language models (LLMs) by grounding responses in retrieved documents. Yet, RAG-based LLMs still hallucinate even when provided with correct and sufficient context. A growing line of work suggests that this stems from an imbalance between how models use external context and their internal knowledge, and several approaches have attempted to quantify these signals for hallucination detection. However, existing methods require extensive hyperparameter tuning, limiting their generalizability. We propose LUMINA, a novel framework that detects hallucinations in RAG systems through context–knowledge signals: external context utilization is quantified via distributional distance, while internal knowledge utilization is measured by tracking how predicted tokens evolve across transformer layers. We further introduce a framework for statistically validating these measurements. Experiments on common RAG hallucination benchmarks and four open-source LLMs show that LUMINA achieves consistently high AUROC and AUPRC scores, outperforming prior utilization-based methods by up to +13% AUROC on HalluRAG. Moreover, LUMINA remains robust under relaxed assumptions about retrieval quality and model matching, offering both effectiveness and practicality.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces LUMINA, a framework for detecting hallucinations in RAG systems by quantifying context-knowledge signals through distributional distance and layer-wise token evolution. It resides in the Context-Knowledge Signal Analysis leaf under Detection Methods Based on Model Internals. Notably, this leaf contains only one paper—LUMINA itself—indicating a sparse research direction within the broader taxonomy of fifty papers. This isolation suggests the specific combination of distributional measures and layer-wise tracking for context-knowledge balance represents a relatively unexplored niche.

The taxonomy reveals that LUMINA's parent branch, Detection Methods Based on Model Internals, also includes Mechanistic Interpretability Approaches with four papers examining attention patterns and layer-wise relevance. Neighboring branches pursue semantic consistency checks (NLI-based detection, multi-perspective analysis) and mitigation strategies (retrieval quality enhancement, adaptive retrieval). LUMINA diverges from these by focusing on internal signal quantification rather than external validation or architectural intervention, positioning it at the intersection of interpretability and detection without crossing into mitigation or post-hoc consistency verification.

Among twenty-four candidates examined, the core LUMINA framework contribution shows two refutable candidates out of ten examined, suggesting some prior work addresses context-knowledge signal analysis. The statistical validation framework contribution found zero refutable candidates across ten examined papers, indicating greater novelty in this methodological aspect. The layer-agnostic measurement approach similarly encountered no refutations among four candidates. These statistics reflect a limited search scope—top-K semantic matches plus citation expansion—rather than exhaustive coverage, meaning additional relevant work may exist beyond the examined set.

Based on the limited search scope, LUMINA appears to occupy a sparsely populated research direction with some overlap in its core framework but greater novelty in its validation methodology and hyperparameter-free design. The single-paper leaf status and low refutation rates across most contributions suggest the work explores a relatively underexplored angle, though the analysis cannot rule out relevant prior work outside the twenty-four candidates examined.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
24
Contribution Candidate Papers Compared
2
Refutable Paper

Research Landscape Overview

Core task: Detecting hallucinations in retrieval-augmented generation systems. The field has organized itself around several complementary perspectives. Detection Methods Based on Model Internals probe the inner workings of language models—examining attention patterns, hidden states, and context-knowledge signals—to identify when generated text diverges from retrieved evidence, as seen in approaches like LUMINA[0] and Lrp4rag[1]. Detection Methods Based on Semantic Consistency instead compare outputs against external references or check for logical coherence across multiple generations, exemplified by works such as ReDeEP[3] and Fine-grained Hallucination[21]. Meanwhile, Mitigation Strategies and System Design focus on architectural interventions—adaptive retrieval, corrective mechanisms, and prompt engineering—to prevent hallucinations before they occur, with representative studies including Corrective RAG[10] and Adaptive Retrieval[42]. Domain-Specific Applications tailor these techniques to specialized contexts like medicine (Medical RAG Benchmark[5], MMed-RAG[50]) or customer service (Multilingual Customer Service[37]), while Evaluation Frameworks and Benchmarks provide standardized testbeds (RAGtruth[8], RAG-Check[14]) and Surveys and Taxonomies (Graph RAG Survey[6], RAG Trustworthiness Survey[11]) synthesize emerging best practices. A central tension runs through the literature: model-internal methods promise early, fine-grained detection by leveraging signals such as attention weights or layer activations, yet they often require white-box access and can be model-specific, whereas semantic-consistency approaches are more portable but may only catch errors post-generation. LUMINA[0] sits squarely within the Context-Knowledge Signal Analysis branch, analyzing how models internally reconcile retrieved context with their parametric knowledge—a strategy that contrasts with purely output-based validators like ReDeEP[3] or prompt-perturbation techniques (Prompt Perturbation[26]). By focusing on interpretable internal signals, LUMINA[0] aims to bridge the gap between early intervention and broad applicability, addressing open questions about when and why retrieval-augmented systems produce unfaithful outputs. This positioning reflects a broader shift toward understanding not just whether hallucinations occur, but how model internals can reveal their root causes.

Claimed Contributions

LUMINA framework for hallucination detection via context-knowledge signals

The authors introduce LUMINA, a framework that detects hallucinations in retrieval-augmented generation systems by separately quantifying external context utilization (using maximum mean discrepancy between token distributions) and internal knowledge utilization (using information processing rate across layers), without requiring extensive hyperparameter tuning.

10 retrieved papers
Can Refute
Statistical validation framework for utilization measurements

The authors develop a statistical hypothesis testing framework to validate that their proposed measurements genuinely capture external context and internal knowledge utilization, addressing a limitation of prior work that only verified correlation with hallucination without validating the scores themselves.

10 retrieved papers
Layer-agnostic measurement approach requiring minimal hyperparameter tuning

Unlike prior methods that require selecting specific attention heads and transformer layers through extensive tuning, LUMINA measures utilization signals in a layer-agnostic way that generalizes better across different models and datasets with minimal hyperparameter configuration.

4 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

LUMINA framework for hallucination detection via context-knowledge signals

The authors introduce LUMINA, a framework that detects hallucinations in retrieval-augmented generation systems by separately quantifying external context utilization (using maximum mean discrepancy between token distributions) and internal knowledge utilization (using information processing rate across layers), without requiring extensive hyperparameter tuning.

Contribution

Statistical validation framework for utilization measurements

The authors develop a statistical hypothesis testing framework to validate that their proposed measurements genuinely capture external context and internal knowledge utilization, addressing a limitation of prior work that only verified correlation with hallucination without validating the scores themselves.

Contribution

Layer-agnostic measurement approach requiring minimal hyperparameter tuning

Unlike prior methods that require selecting specific attention heads and transformer layers through extensive tuning, LUMINA measures utilization signals in a layer-agnostic way that generalizes better across different models and datasets with minimal hyperparameter configuration.