The First Impression Problem: Internal Bias Triggers Overthinking in Reasoning Models

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 5.5 Download Report PDF

Large Language ModelLarge Reasoning ModelOverthinking

Reasoning models often exhibit overthinking, characterized by redundant reasoning steps. We identify \emph{internal bias} elicited by the input question as a key trigger of such behavior. Upon encountering a problem, the model immediately forms a preliminary guess about the answer, which we term an internal bias since it may not be explicitly generated, and it arises without systematic reasoning. When this guess conflicts with its subsequent reasoning, the model tends to engage in excessive reflection, resulting in wasted computation. We validate the association between internal bias and overthinking across multiple models and diverse reasoning tasks. To demonstrate the causal relationship more rigorously, we conduct two counterfactual interventions, showing that removing the input question after the model reduces the redundant reasoning across various complex reasoning tasks, and manually injecting bias affects overthinking accordingly. Further interpretability experiments suggest that excessive attention to the input question serves as a key mechanism through which internal bias influences subsequent reasoning trajectories. Finally, we evaluated several methods aimed at mitigating overthinking, yet the influence of internal bias persisted under all conditions.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper identifies internal bias—preliminary guesses formed before systematic reasoning—as a trigger for overthinking in reasoning models. It sits within the 'Internal Bias as Overthinking Trigger' leaf of the taxonomy, which contains only two papers total. This represents a relatively sparse research direction within the broader field of overthinking characterization. The taxonomy shows that while overthinking mechanisms have attracted attention across multiple branches, the specific focus on internal bias as a causal trigger remains underexplored compared to adjacent areas like mitigation strategies or robustness testing.

The taxonomy reveals neighboring work in 'Overthinking Patterns and Dynamics' examining self-affirmation reflections and reasoning length effects, and 'Cognitive Conviction and Belief Quantification' measuring depth of belief in model outputs. The paper's emphasis on attention mechanisms and counterfactual interventions connects it to robustness studies under 'Structural Perturbation and Misleading Context,' though those focus on external perturbations rather than internal bias formation. The mitigation branches explore debiasing training and adaptive budget allocation, representing downstream applications of the characterization work this paper contributes to.

Among thirty candidates examined, the first contribution—identifying internal bias as an overthinking trigger—shows one refutable candidate from ten examined, suggesting some prior recognition of this phenomenon. The second contribution on counterfactual interventions (removing input questions, manually injecting bias) examined ten candidates with none clearly refuting it, indicating methodological novelty in establishing causality. The third contribution on attention-based mechanisms similarly found no refutations among ten candidates. The limited search scope means these statistics reflect top semantic matches rather than exhaustive coverage, particularly given the sparse two-paper leaf this work occupies.

Based on the limited thirty-candidate search, the paper appears to advance understanding of internal bias mechanisms through novel causal interventions and interpretability analyses. The sparse taxonomy leaf and low refutation rates for two of three contributions suggest potential novelty, though the single refutation for the core bias identification indicates some conceptual overlap exists. The analysis cannot assess whether deeper literature searches or domain-specific venues might reveal additional prior work on bias-triggered overthinking phenomena.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Internal bias triggering overthinking in reasoning models. The field structure reflects a multifaceted investigation into how reasoning systems can become trapped in unproductive cycles. The taxonomy organizes research into several main branches: characterizing overthinking mechanisms and their triggers, developing mitigation strategies, ensuring reasoning robustness and reliability, drawing parallels with human cognitive biases, exploring philosophical foundations, and applying insights to specialized domains. Works like Internal Bias Overthinking[2] and First Impression Problem[0] exemplify efforts to understand how internal biases initiate excessive deliberation, while branches on mitigation approaches explore interventions such as debiasing training and self-affirmation techniques. The taxonomy reveals a tension between descriptive studies that diagnose overthinking phenomena and prescriptive methods that aim to prevent or correct them. Particularly active lines of work examine the interplay between initial model biases and subsequent reasoning quality. Several studies investigate how early-stage judgments can cascade into overthinking, with First Impression Problem[0] focusing specifically on how initial biases trigger unproductive extended reasoning. This work sits closely alongside Internal Bias Overthinking[2], which similarly explores internal triggers for excessive deliberation. Nearby efforts like Cothink[1] and When More Less[5] probe the broader question of when additional reasoning steps help versus harm performance, highlighting trade-offs between thoroughness and efficiency. Meanwhile, mitigation-focused papers such as Suppression Self-Affirmation[3] and Deliberation Debias Training[4] propose interventions to counteract these biases. The original paper occupies a niche within overthinking characterization, emphasizing the role of first impressions as a specific internal bias mechanism that distinguishes it from more general studies of reasoning failures or robustness challenges.

Claimed Contributions

Identification of internal bias as a trigger for overthinking

Can Refute

10 retrieved papers

The authors identify that reasoning models form a preliminary guess (internal bias) upon encountering a problem, and when this guess conflicts with subsequent reasoning, the model engages in excessive reflection, causing overthinking and wasted computation.

10 retrieved papers

Can Refute

Counterfactual interventions demonstrating causal relationship

10 retrieved papers

The authors propose two counterfactual validation methods: removing the input question after an answer is generated and manually injecting bias. These interventions demonstrate a causal link between internal bias and overthinking, with question removal reducing redundant reasoning by 31% to 53%.

10 retrieved papers

Attention-based mechanism explaining bias influence

10 retrieved papers

The authors discover through interpretability analysis that models excessively attend to the input question when deciding whether to reflect further. This heightened attention reactivates internal bias, influencing the decision to engage in additional reasoning steps.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[2] Internal Bias in Reasoning Models leads to Overthinking PDF

R Dang, S Huang, J Chen (2025)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Identification of internal bias as a trigger for overthinking

[2] Internal Bias in Reasoning Models leads to Overthinking PDF

Can Refute

[1] Cothink: Token-efficient reasoning via instruct models guiding reasoning models PDF

Cannot Refute

[4] Examining the role of deliberation in de-bias training PDF

Cannot Refute

[5] When more is less: Understanding chain-of-thought length in llms PDF

Cannot Refute

[7] Safety in large reasoning models: A survey PDF

Cannot Refute

[8] CodeCrash: Exposing LLM Fragility to Misleading Natural Language in Code Reasoning PDF

Cannot Refute

[13] Marco-o1 v2: Towards Widening The Distillation Bottleneck for Reasoning Models PDF

Cannot Refute

[43] Reconsidering Overthinking: Penalizing Internal and External Redundancy in CoT Reasoning PDF

Cannot Refute

[44] Fast and slow thinking; and the problem of conflating clinical reasoning and ethical deliberation in acute decisionâmaking PDF

Cannot Refute

[45] Reasoning is for arguing: Understanding the successes and failures of deliberation PDF

Cannot Refute

Contribution

Counterfactual interventions demonstrating causal relationship

[23] Cutting off the head ends the conflict: A mechanism for interpreting and mitigating knowledge conflicts in language models PDF

Cannot Refute

[24] Com: A Causal-Guided Benchmark for Exploring Complex Commonsense Reasoning in Large Language Models PDF

Cannot Refute

[25] Mitigating confounding bias for recommendation via counterfactual inference PDF

Cannot Refute

[26] Mitigating popularity bias in recommendation via counterfactual inference PDF

Cannot Refute

[27] CVLN-Think: Causal Inference with Counterfactual Style Adaptation for Continuous Vision-and-Language Navigation PDF

Cannot Refute

[28] Act before you overThink: Make decisions easier and liberate your mind PDF

Cannot Refute

[29] Benchmarking Explainability Methods Across Vision and Language Tasks: A Practitioner's Perspective PDF

Cannot Refute

[30] CaRT: Teaching LLM Agents to Know When They Know Enough PDF

Cannot Refute

[31] Children's Counterfactual Reasoning About Causally Overdetermined Events. PDF

Cannot Refute

[32] Systematic extension of CRISP-DM by structured mapping of emerging regulatory requirements on bias in AI PDF

Cannot Refute

Contribution

Attention-based mechanism explaining bias influence

[33] It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization PDF

Cannot Refute

[34] A Comprehensive Survey on Deep Learning Techniques in Educational Data Mining: Y. Lin et al. PDF

Cannot Refute

[35] FairForensics: mitigating attribute bias in deepfake detection by integrating texture and attribute features PDF

Cannot Refute

[36] Action understanding and active inference PDF

Cannot Refute

[37] An Analysis for Reasoning Bias of Language Models with Small Initialization PDF

Cannot Refute

[38] Attention over learned object embeddings enables complex visual reasoning PDF

Cannot Refute

[39] Graph reasoning transformers for knowledge-aware question answering PDF

Cannot Refute

[40] Debiasing pretrained text encoders by paying attention to paying attention PDF

Cannot Refute

[41] Forgetting-aware Linear Bias for Attentive Knowledge Tracing PDF

Cannot Refute

[42] Correcting Negative Bias in Large Language Models through Negative Attention Score Alignment PDF

Cannot Refute

The First Impression Problem: Internal Bias Triggers Overthinking in Reasoning Models

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[2] Internal Bias in Reasoning Models leads to Overthinking PDF

Contribution Analysis

Identification of internal bias as a trigger for overthinking

[2] Internal Bias in Reasoning Models leads to Overthinking PDF

[1] Cothink: Token-efficient reasoning via instruct models guiding reasoning models PDF

[4] Examining the role of deliberation in de-bias training PDF

[5] When more is less: Understanding chain-of-thought length in llms PDF

[7] Safety in large reasoning models: A survey PDF

[8] CodeCrash: Exposing LLM Fragility to Misleading Natural Language in Code Reasoning PDF

[13] Marco-o1 v2: Towards Widening The Distillation Bottleneck for Reasoning Models PDF

[43] Reconsidering Overthinking: Penalizing Internal and External Redundancy in CoT Reasoning PDF

[44] Fast and slow thinking; and the problem of conflating clinical reasoning and ethical deliberation in acute decisionâmaking PDF

[45] Reasoning is for arguing: Understanding the successes and failures of deliberation PDF

Counterfactual interventions demonstrating causal relationship

[23] Cutting off the head ends the conflict: A mechanism for interpreting and mitigating knowledge conflicts in language models PDF

[24] Com: A Causal-Guided Benchmark for Exploring Complex Commonsense Reasoning in Large Language Models PDF

[25] Mitigating confounding bias for recommendation via counterfactual inference PDF

[26] Mitigating popularity bias in recommendation via counterfactual inference PDF

[27] CVLN-Think: Causal Inference with Counterfactual Style Adaptation for Continuous Vision-and-Language Navigation PDF

[28] Act before you overThink: Make decisions easier and liberate your mind PDF

[29] Benchmarking Explainability Methods Across Vision and Language Tasks: A Practitioner's Perspective PDF

[30] CaRT: Teaching LLM Agents to Know When They Know Enough PDF

[31] Children's Counterfactual Reasoning About Causally Overdetermined Events. PDF

[32] Systematic extension of CRISP-DM by structured mapping of emerging regulatory requirements on bias in AI PDF

Attention-based mechanism explaining bias influence

[33] It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization PDF

[34] A Comprehensive Survey on Deep Learning Techniques in Educational Data Mining: Y. Lin et al. PDF

[35] FairForensics: mitigating attribute bias in deepfake detection by integrating texture and attribute features PDF

[36] Action understanding and active inference PDF

[37] An Analysis for Reasoning Bias of Language Models with Small Initialization PDF

[38] Attention over learned object embeddings enables complex visual reasoning PDF

[39] Graph reasoning transformers for knowledge-aware question answering PDF

[40] Debiasing pretrained text encoders by paying attention to paying attention PDF

[41] Forgetting-aware Linear Bias for Attentive Knowledge Tracing PDF

[42] Correcting Negative Bias in Large Language Models through Negative Attention Score Alignment PDF

Table of Contents

[44] Fast and slow thinking; and the problem of conflating clinical reasoning and ethical deliberation in acute decisionâmaking PDF