Hallucination Begins Where Saliency Drops

ICLR 2026 Conference SubmissionAnonymous Authors
LVLMs-Saliency; Saliency-Guided Rejection Sampling; Local Coherence Reinforcement; Hallucination
Abstract:

Recent studies have investigated attention dynamics in large vision language models (LVLMs), yet existing methods remain limited in reliably distinguishing hallucinated from correct outputs — primarily because they rely solely on forward-pass attention, ignoring gradient-based signals that reveal how token influence propagates through the model. To bridge this gap, we introduce \textbf{LVLMs-Saliency}, an \textit{gradient-aware diagnostic tool} that quantifies the grounding strength of each output token by fusing attention weights with their gradients. Through analysis, we identify a decisive pattern: \textit{Hallucinations occur when prior output tokens shows low saliency to the next token prediction}, indicating a failure of contextual memory. Building on this insight, we propose a dual-mechanism inference-time framework: (1) Saliency-Guided Rejection Sampling (SGRS), which dynamically filters candidate tokens during decoding by rejecting those with saliency below a context-adaptive threshold, thereby preventing coherence-breaking tokens from entering the sequence; and (2) Local Coherence Reinforcement (LocoRE), a lightweight plug-and-play module that strengthens attention from the current token to its most recent outputs, actively counteracting the “forgetting” behavior identified by LVLMs-Saliency. Experimental results demonstrate that our method significantly reduces hallucinations across multiple LVLMs, offering a robust and interpretable solution to improve model reliability.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper contributes a gradient-aware diagnostic tool (LVLMs-Saliency) and two inference-time mitigation techniques (SGRS and LocoRE) for hallucination reduction in large vision language models. It resides in the Mechanistic Analysis of Hallucination Causes leaf, which contains only three papers total. This is a relatively sparse research direction compared to more crowded branches like Contrastive Decoding Methods or Object Hallucination Evaluation, suggesting the mechanistic perspective on hallucination causation remains underexplored despite its foundational importance for understanding model failures.

The taxonomy reveals dense activity in inference-time interventions (four subcategories with multiple papers each) and training-based approaches (five subcategories), while mechanistic analysis remains lean. Neighboring leaves include Specialized Hallucination Evaluation and evaluation benchmarks, which measure hallucinations empirically but do not probe internal mechanisms. The paper bridges mechanistic understanding and practical intervention: its diagnostic tool analyzes attention-gradient patterns (mechanistic), while SGRS and LocoRE apply these insights during decoding (inference-time). This dual focus spans categories, connecting sparse mechanistic analysis with crowded decoding intervention space.

Among thirty candidates examined, the gradient-aware diagnostic contribution (Contribution A) shows overlap: two of ten candidates examined can refute it, indicating prior work exists on attention-gradient analysis for hallucination detection. However, the two mitigation techniques appear more novel—SGRS and LocoRE each examined ten candidates with zero refutations found. This suggests the diagnostic mechanism has precedent in the limited search scope, while the specific decoding strategies (rejection sampling via saliency thresholds, coherence reinforcement modules) may represent less-explored territory within the examined candidate pool.

Based on the limited search of thirty semantically similar papers, the mechanistic diagnostic tool builds on existing attention-gradient work, while the inference-time mitigation techniques show fewer overlaps. The sparse mechanistic analysis leaf and the absence of refutations for SGRS/LocoRE suggest potential novelty, though the restricted search scope cannot confirm exhaustive uniqueness. The paper's positioning across mechanistic understanding and practical decoding intervention reflects an underrepresented bridge in current literature.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
30
Contribution Candidate Papers Compared
2
Refutable Paper

Research Landscape Overview

Core task: Mitigating hallucinations in large vision language models. The field has organized itself around several complementary strategies. Hallucination Analysis and Evaluation establishes diagnostic foundations, examining root causes and developing benchmarks such as Hallusionbench[36] and Object Hallucination Analysis[3]. Inference-Time Decoding Interventions modify generation on-the-fly through techniques like Visual Contrastive Decoding[37] and CLIP-Guided Decoding[43], while Training-Based Mitigation Approaches reshape model behavior via methods such as V-DPO[39] and Reflective Instruction Tuning[22]. Post-Hoc Correction and Detection tools like Woodpecker[10] refine outputs after generation, and Domain-Specific and Application-Oriented Mitigation addresses specialized contexts including medical applications. Cross-Cutting Perspectives and Related Work situates these efforts within broader foundation model challenges, as surveyed in Hallucination Survey[1] and Visual Hallucination Survey[7]. Recent mechanistic investigations probe how hallucinations emerge within model architectures, contrasting with purely empirical mitigation strategies. Works like Information Storage Transfer[25] and Visual Object Analysis[41] dissect internal representations to understand failure modes at a granular level. Saliency Drops[0] fits naturally within this mechanistic branch, analyzing how attention patterns and saliency shifts contribute to hallucinatory outputs, complementing neighboring studies that trace information flow and object-level processing. This mechanistic perspective differs from intervention-focused approaches like Image-Biased Decoding[6] or training adjustments in MH-PEFT[18], offering interpretability that can inform both decoding strategies and training objectives. The interplay between understanding root causes and deploying practical fixes remains a central tension, with mechanistic analyses providing theoretical grounding for the diverse mitigation techniques proliferating across inference-time, training-based, and post-hoc correction branches.

Claimed Contributions

LVLMs-Saliency: gradient-aware diagnostic tool for hallucination detection

The authors propose LVLMs-Saliency, an unsupervised metric that combines attention weights with their gradients to measure how strongly each previously generated output token influences the next token prediction. This reveals a pattern where hallucinations occur when prior output tokens show low saliency to the next token, indicating contextual memory failure.

10 retrieved papers
Can Refute
Saliency-Guided Rejection Sampling (SGRS)

SGRS is a proactive filtering mechanism that evaluates candidate output tokens before commitment by computing their saliency and rejecting those with weak contextual dependencies (low saliency below a context-adaptive threshold). This prevents coherence-breaking tokens from entering the sequence and triggering hallucinations.

10 retrieved papers
Local Coherence Reinforcement (LocoRE)

LocoRE is a reactive stabilization mechanism that operates after token acceptance by strengthening attention weights from the current query token to the most recent output tokens using a distance-aware gain factor. This ensures the model maintains strong attentional links to its immediate past, counteracting the forgetting behavior identified by LVLMs-Saliency.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

LVLMs-Saliency: gradient-aware diagnostic tool for hallucination detection

The authors propose LVLMs-Saliency, an unsupervised metric that combines attention weights with their gradients to measure how strongly each previously generated output token influences the next token prediction. This reveals a pattern where hallucinations occur when prior output tokens show low saliency to the next token, indicating contextual memory failure.

Contribution

Saliency-Guided Rejection Sampling (SGRS)

SGRS is a proactive filtering mechanism that evaluates candidate output tokens before commitment by computing their saliency and rejecting those with weak contextual dependencies (low saliency below a context-adaptive threshold). This prevents coherence-breaking tokens from entering the sequence and triggering hallucinations.

Contribution

Local Coherence Reinforcement (LocoRE)

LocoRE is a reactive stabilization mechanism that operates after token acceptance by strengthening attention weights from the current query token to the most recent output tokens using a distance-aware gain factor. This ensures the model maintains strong attentional links to its immediate past, counteracting the forgetting behavior identified by LVLMs-Saliency.