Hallucination Begins Where Saliency Drops

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

LVLMs-Saliency; Saliency-Guided Rejection Sampling; Local Coherence Reinforcement; Hallucination

Recent studies have investigated attention dynamics in large vision language models (LVLMs), yet existing methods remain limited in reliably distinguishing hallucinated from correct outputs — primarily because they rely solely on forward-pass attention, ignoring gradient-based signals that reveal how token influence propagates through the model. To bridge this gap, we introduce \textbf{LVLMs-Saliency}, an \textit{gradient-aware diagnostic tool} that quantifies the grounding strength of each output token by fusing attention weights with their gradients. Through analysis, we identify a decisive pattern: \textit{Hallucinations occur when prior output tokens shows low saliency to the next token prediction}, indicating a failure of contextual memory. Building on this insight, we propose a dual-mechanism inference-time framework: (1) Saliency-Guided Rejection Sampling (SGRS), which dynamically filters candidate tokens during decoding by rejecting those with saliency below a context-adaptive threshold, thereby preventing coherence-breaking tokens from entering the sequence; and (2) Local Coherence Reinforcement (LocoRE), a lightweight plug-and-play module that strengthens attention from the current token to its most recent outputs, actively counteracting the “forgetting” behavior identified by LVLMs-Saliency. Experimental results demonstrate that our method significantly reduces hallucinations across multiple LVLMs, offering a robust and interpretable solution to improve model reliability.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper contributes a gradient-aware diagnostic tool (LVLMs-Saliency) and two inference-time mitigation techniques (SGRS and LocoRE) for hallucination reduction in large vision language models. It resides in the Mechanistic Analysis of Hallucination Causes leaf, which contains only three papers total. This is a relatively sparse research direction compared to more crowded branches like Contrastive Decoding Methods or Object Hallucination Evaluation, suggesting the mechanistic perspective on hallucination causation remains underexplored despite its foundational importance for understanding model failures.

The taxonomy reveals dense activity in inference-time interventions (four subcategories with multiple papers each) and training-based approaches (five subcategories), while mechanistic analysis remains lean. Neighboring leaves include Specialized Hallucination Evaluation and evaluation benchmarks, which measure hallucinations empirically but do not probe internal mechanisms. The paper bridges mechanistic understanding and practical intervention: its diagnostic tool analyzes attention-gradient patterns (mechanistic), while SGRS and LocoRE apply these insights during decoding (inference-time). This dual focus spans categories, connecting sparse mechanistic analysis with crowded decoding intervention space.

Among thirty candidates examined, the gradient-aware diagnostic contribution (Contribution A) shows overlap: two of ten candidates examined can refute it, indicating prior work exists on attention-gradient analysis for hallucination detection. However, the two mitigation techniques appear more novel—SGRS and LocoRE each examined ten candidates with zero refutations found. This suggests the diagnostic mechanism has precedent in the limited search scope, while the specific decoding strategies (rejection sampling via saliency thresholds, coherence reinforcement modules) may represent less-explored territory within the examined candidate pool.

Based on the limited search of thirty semantically similar papers, the mechanistic diagnostic tool builds on existing attention-gradient work, while the inference-time mitigation techniques show fewer overlaps. The sparse mechanistic analysis leaf and the absence of refutations for SGRS/LocoRE suggest potential novelty, though the restricted search scope cannot confirm exhaustive uniqueness. The paper's positioning across mechanistic understanding and practical decoding intervention reflects an underrepresented bridge in current literature.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Mitigating hallucinations in large vision language models. The field has organized itself around several complementary strategies. Hallucination Analysis and Evaluation establishes diagnostic foundations, examining root causes and developing benchmarks such as Hallusionbench[36] and Object Hallucination Analysis[3]. Inference-Time Decoding Interventions modify generation on-the-fly through techniques like Visual Contrastive Decoding[37] and CLIP-Guided Decoding[43], while Training-Based Mitigation Approaches reshape model behavior via methods such as V-DPO[39] and Reflective Instruction Tuning[22]. Post-Hoc Correction and Detection tools like Woodpecker[10] refine outputs after generation, and Domain-Specific and Application-Oriented Mitigation addresses specialized contexts including medical applications. Cross-Cutting Perspectives and Related Work situates these efforts within broader foundation model challenges, as surveyed in Hallucination Survey[1] and Visual Hallucination Survey[7]. Recent mechanistic investigations probe how hallucinations emerge within model architectures, contrasting with purely empirical mitigation strategies. Works like Information Storage Transfer[25] and Visual Object Analysis[41] dissect internal representations to understand failure modes at a granular level. Saliency Drops[0] fits naturally within this mechanistic branch, analyzing how attention patterns and saliency shifts contribute to hallucinatory outputs, complementing neighboring studies that trace information flow and object-level processing. This mechanistic perspective differs from intervention-focused approaches like Image-Biased Decoding[6] or training adjustments in MH-PEFT[18], offering interpretability that can inform both decoding strategies and training objectives. The interplay between understanding root causes and deploying practical fixes remains a central tension, with mechanistic analyses providing theoretical grounding for the diverse mitigation techniques proliferating across inference-time, training-based, and post-hoc correction branches.

Claimed Contributions

LVLMs-Saliency: gradient-aware diagnostic tool for hallucination detection

Can Refute

10 retrieved papers

The authors propose LVLMs-Saliency, an unsupervised metric that combines attention weights with their gradients to measure how strongly each previously generated output token influences the next token prediction. This reveals a pattern where hallucinations occur when prior output tokens show low saliency to the next token, indicating contextual memory failure.

10 retrieved papers

Can Refute

Saliency-Guided Rejection Sampling (SGRS)

10 retrieved papers

SGRS is a proactive filtering mechanism that evaluates candidate output tokens before commitment by computing their saliency and rejecting those with weak contextual dependencies (low saliency below a context-adaptive threshold). This prevents coherence-breaking tokens from entering the sequence and triggering hallucinations.

10 retrieved papers

Local Coherence Reinforcement (LocoRE)

10 retrieved papers

LocoRE is a reactive stabilization mechanism that operates after token acceptance by strengthening attention weights from the current query token to the most recent output tokens using a distance-aware gain factor. This ensures the model maintains strong attentional links to its immediate past, counteracting the forgetting behavior identified by LVLMs-Saliency.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[25] Understanding information storage and transfer in multi-modal large language models PDF

Samyadeep Basu, Soheil Feizi, Martin Grayson, Daniela Massiceti, Cecily Morrison, Besmira Nushi (2024)

[41] A Comprehensive Analysis for Visual Object Hallucination in Large Vision-Language Models PDF

Jing Li-qiang, Chen, Guiming Hardy, Liqiang Jing, Aghazadeh, Ehsan, Guiming Hardy Chen, Wang, Xin Eric, Ehsan Aghazadeh, Du, Xinya, Xin Eric Wang, Xinya Du (2025) • arXiv.org

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

LVLMs-Saliency: gradient-aware diagnostic tool for hallucination detection

[70] Gradient-guided Attention Map Editing: Towards Efficient Contextual Hallucination Mitigation PDF

Can Refute

[74] Mitigating Multimodal Hallucinations via Gradient-based Self-Reflection PDF

Can Refute

[24] Hallucination of multimodal large language models: A survey PDF

Cannot Refute

[71] From Hallucinations to Jailbreaks: Rethinking the Vulnerability of Large Foundation Models PDF

Cannot Refute

[72] GLIMPSE: Holistic Cross-Modal Explainability for Large Vision-Language Models PDF

Cannot Refute

[73] Enhancing Caption Fidelity via Explanation-Guided Captioning with Vision-Language Fine-Tuning PDF

Cannot Refute

[75] GLIMPSE: Gradient-Layer Importance Mapping for Prompted Visual Saliency Explanation for Generative LVLMs PDF

Cannot Refute

[76] VIB-Probe: Detecting and Mitigating Hallucinations in Vision-Language Models via Variational Information Bottleneck PDF

Cannot Refute

[77] INSIGHT: An Interpretable Neural Vision-Language Framework for Reasoning of Generative Artifacts PDF

Cannot Refute

[78] Understanding Sensitivity of Differential Attention through the Lens of Adversarial Robustness PDF

Cannot Refute

Contribution

Saliency-Guided Rejection Sampling (SGRS)

[12] Multi-object hallucination in vision language models PDF

Cannot Refute

[61] Attention hijackers: Detect and disentangle attention hijacking in lvlms for hallucination mitigation PDF

Cannot Refute

[62] PEGASUS-XL with saliency-guided scoring and long-input encoding for multi-document abstractive summarization PDF

Cannot Refute

[63] Not All Tokens and Heads Are Equally Important: Dual-Level Attention Intervention for Hallucination Mitigation PDF

Cannot Refute

[64] SARA: Salience-Aware Reinforced Adaptive Decoding for Large Language Models in Abstractive Summarization PDF

Cannot Refute

[65] VASparse: Towards Efficient Visual Hallucination Mitigation via Visual-Aware Token Sparsification PDF

Cannot Refute

[66] Saliency-guided meta-hallucinator for few-shot learning PDF

Cannot Refute

[67] PruneHal: Reducing Hallucinations in Multi-modal Large Language Models through Adaptive KV Cache Pruning PDF

Cannot Refute

[68] Scaling Test-Time Inference with Policy-Optimized, Dynamic Retrieval-Augmented Generation via KV Caching and Decoding PDF

Cannot Refute

[69] Attribution-Guided Decoding PDF

Cannot Refute

Contribution

Local Coherence Reinforcement (LocoRE)

[51] Neural Contextual Reinforcement Framework for Logical Structure Language Generation PDF

Cannot Refute

[52] Topic-Guided Reinforcement Learning with LLMs for Enhancing Multi-Document Summarization PDF

Cannot Refute

[53] Contextual gradient recomposition for sequential coherence preservation in large language model token generation PDF

Cannot Refute

[54] A Self-Supervised Reinforcement Learning Approach for Fine-Tuning Large Language Models Using Cross-Attention Signals PDF

Cannot Refute

[55] Multi-Source Interactive Stair Attention for Remote Sensing Image Captioning PDF

Cannot Refute

[56] Silent grammars in emergent language models: An exploratory study of latent instructional drift via stochastic scaffold morphogenesis PDF

Cannot Refute

[57] Controllable Music Loops Generation with MIDI and Text via Multi-Stage Cross Attention and Instrument-Aware Reinforcement Learning PDF

Cannot Refute

[58] Reinforcing Coherence for Sequence to Sequence Model in Dialogue Generation. PDF

Cannot Refute

[59] Response Generation by Jointly Modeling Personalized Linguistic Styles and Emotions PDF

Cannot Refute

[60] Coherent Dialog Generation with Query Graph PDF

Cannot Refute

Hallucination Begins Where Saliency Drops

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[25] Understanding information storage and transfer in multi-modal large language models PDF

[41] A Comprehensive Analysis for Visual Object Hallucination in Large Vision-Language Models PDF

Contribution Analysis

LVLMs-Saliency: gradient-aware diagnostic tool for hallucination detection

[70] Gradient-guided Attention Map Editing: Towards Efficient Contextual Hallucination Mitigation PDF

[74] Mitigating Multimodal Hallucinations via Gradient-based Self-Reflection PDF

[24] Hallucination of multimodal large language models: A survey PDF

[71] From Hallucinations to Jailbreaks: Rethinking the Vulnerability of Large Foundation Models PDF

[72] GLIMPSE: Holistic Cross-Modal Explainability for Large Vision-Language Models PDF

[73] Enhancing Caption Fidelity via Explanation-Guided Captioning with Vision-Language Fine-Tuning PDF

[75] GLIMPSE: Gradient-Layer Importance Mapping for Prompted Visual Saliency Explanation for Generative LVLMs PDF

[76] VIB-Probe: Detecting and Mitigating Hallucinations in Vision-Language Models via Variational Information Bottleneck PDF

[77] INSIGHT: An Interpretable Neural Vision-Language Framework for Reasoning of Generative Artifacts PDF

[78] Understanding Sensitivity of Differential Attention through the Lens of Adversarial Robustness PDF

Saliency-Guided Rejection Sampling (SGRS)

[12] Multi-object hallucination in vision language models PDF

[61] Attention hijackers: Detect and disentangle attention hijacking in lvlms for hallucination mitigation PDF

[62] PEGASUS-XL with saliency-guided scoring and long-input encoding for multi-document abstractive summarization PDF

[63] Not All Tokens and Heads Are Equally Important: Dual-Level Attention Intervention for Hallucination Mitigation PDF

[64] SARA: Salience-Aware Reinforced Adaptive Decoding for Large Language Models in Abstractive Summarization PDF

[65] VASparse: Towards Efficient Visual Hallucination Mitigation via Visual-Aware Token Sparsification PDF

[66] Saliency-guided meta-hallucinator for few-shot learning PDF

[67] PruneHal: Reducing Hallucinations in Multi-modal Large Language Models through Adaptive KV Cache Pruning PDF

[68] Scaling Test-Time Inference with Policy-Optimized, Dynamic Retrieval-Augmented Generation via KV Caching and Decoding PDF

[69] Attribution-Guided Decoding PDF

Local Coherence Reinforcement (LocoRE)

[51] Neural Contextual Reinforcement Framework for Logical Structure Language Generation PDF

[52] Topic-Guided Reinforcement Learning with LLMs for Enhancing Multi-Document Summarization PDF

[53] Contextual gradient recomposition for sequential coherence preservation in large language model token generation PDF

[54] A Self-Supervised Reinforcement Learning Approach for Fine-Tuning Large Language Models Using Cross-Attention Signals PDF

[55] Multi-Source Interactive Stair Attention for Remote Sensing Image Captioning PDF

[56] Silent grammars in emergent language models: An exploratory study of latent instructional drift via stochastic scaffold morphogenesis PDF

[57] Controllable Music Loops Generation with MIDI and Text via Multi-Stage Cross Attention and Instrument-Aware Reinforcement Learning PDF

[58] Reinforcing Coherence for Sequence to Sequence Model in Dialogue Generation. PDF

[59] Response Generation by Jointly Modeling Personalized Linguistic Styles and Emotions PDF

[60] Coherent Dialog Generation with Query Graph PDF

Table of Contents