The First Impression Problem: Internal Bias Triggers Overthinking in Reasoning Models

ICLR 2026 Conference SubmissionAnonymous Authors
Large Language ModelLarge Reasoning ModelOverthinking
Abstract:

Reasoning models often exhibit overthinking, characterized by redundant reasoning steps. We identify \emph{internal bias} elicited by the input question as a key trigger of such behavior. Upon encountering a problem, the model immediately forms a preliminary guess about the answer, which we term an internal bias since it may not be explicitly generated, and it arises without systematic reasoning. When this guess conflicts with its subsequent reasoning, the model tends to engage in excessive reflection, resulting in wasted computation. We validate the association between internal bias and overthinking across multiple models and diverse reasoning tasks. To demonstrate the causal relationship more rigorously, we conduct two counterfactual interventions, showing that removing the input question after the model reduces the redundant reasoning across various complex reasoning tasks, and manually injecting bias affects overthinking accordingly. Further interpretability experiments suggest that excessive attention to the input question serves as a key mechanism through which internal bias influences subsequent reasoning trajectories. Finally, we evaluated several methods aimed at mitigating overthinking, yet the influence of internal bias persisted under all conditions.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper identifies internal bias—preliminary guesses formed before systematic reasoning—as a trigger for overthinking in reasoning models. It sits within the 'Internal Bias as Overthinking Trigger' leaf of the taxonomy, which contains only two papers total. This represents a relatively sparse research direction within the broader field of overthinking characterization. The taxonomy shows that while overthinking mechanisms have attracted attention across multiple branches, the specific focus on internal bias as a causal trigger remains underexplored compared to adjacent areas like mitigation strategies or robustness testing.

The taxonomy reveals neighboring work in 'Overthinking Patterns and Dynamics' examining self-affirmation reflections and reasoning length effects, and 'Cognitive Conviction and Belief Quantification' measuring depth of belief in model outputs. The paper's emphasis on attention mechanisms and counterfactual interventions connects it to robustness studies under 'Structural Perturbation and Misleading Context,' though those focus on external perturbations rather than internal bias formation. The mitigation branches explore debiasing training and adaptive budget allocation, representing downstream applications of the characterization work this paper contributes to.

Among thirty candidates examined, the first contribution—identifying internal bias as an overthinking trigger—shows one refutable candidate from ten examined, suggesting some prior recognition of this phenomenon. The second contribution on counterfactual interventions (removing input questions, manually injecting bias) examined ten candidates with none clearly refuting it, indicating methodological novelty in establishing causality. The third contribution on attention-based mechanisms similarly found no refutations among ten candidates. The limited search scope means these statistics reflect top semantic matches rather than exhaustive coverage, particularly given the sparse two-paper leaf this work occupies.

Based on the limited thirty-candidate search, the paper appears to advance understanding of internal bias mechanisms through novel causal interventions and interpretability analyses. The sparse taxonomy leaf and low refutation rates for two of three contributions suggest potential novelty, though the single refutation for the core bias identification indicates some conceptual overlap exists. The analysis cannot assess whether deeper literature searches or domain-specific venues might reveal additional prior work on bias-triggered overthinking phenomena.

Taxonomy

Core-task Taxonomy Papers
22
3
Claimed Contributions
30
Contribution Candidate Papers Compared
1
Refutable Paper

Research Landscape Overview

Core task: Internal bias triggering overthinking in reasoning models. The field structure reflects a multifaceted investigation into how reasoning systems can become trapped in unproductive cycles. The taxonomy organizes research into several main branches: characterizing overthinking mechanisms and their triggers, developing mitigation strategies, ensuring reasoning robustness and reliability, drawing parallels with human cognitive biases, exploring philosophical foundations, and applying insights to specialized domains. Works like Internal Bias Overthinking[2] and First Impression Problem[0] exemplify efforts to understand how internal biases initiate excessive deliberation, while branches on mitigation approaches explore interventions such as debiasing training and self-affirmation techniques. The taxonomy reveals a tension between descriptive studies that diagnose overthinking phenomena and prescriptive methods that aim to prevent or correct them. Particularly active lines of work examine the interplay between initial model biases and subsequent reasoning quality. Several studies investigate how early-stage judgments can cascade into overthinking, with First Impression Problem[0] focusing specifically on how initial biases trigger unproductive extended reasoning. This work sits closely alongside Internal Bias Overthinking[2], which similarly explores internal triggers for excessive deliberation. Nearby efforts like Cothink[1] and When More Less[5] probe the broader question of when additional reasoning steps help versus harm performance, highlighting trade-offs between thoroughness and efficiency. Meanwhile, mitigation-focused papers such as Suppression Self-Affirmation[3] and Deliberation Debias Training[4] propose interventions to counteract these biases. The original paper occupies a niche within overthinking characterization, emphasizing the role of first impressions as a specific internal bias mechanism that distinguishes it from more general studies of reasoning failures or robustness challenges.

Claimed Contributions

Identification of internal bias as a trigger for overthinking

The authors identify that reasoning models form a preliminary guess (internal bias) upon encountering a problem, and when this guess conflicts with subsequent reasoning, the model engages in excessive reflection, causing overthinking and wasted computation.

10 retrieved papers
Can Refute
Counterfactual interventions demonstrating causal relationship

The authors propose two counterfactual validation methods: removing the input question after an answer is generated and manually injecting bias. These interventions demonstrate a causal link between internal bias and overthinking, with question removal reducing redundant reasoning by 31% to 53%.

10 retrieved papers
Attention-based mechanism explaining bias influence

The authors discover through interpretability analysis that models excessively attend to the input question when deciding whether to reflect further. This heightened attention reactivates internal bias, influencing the decision to engage in additional reasoning steps.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Identification of internal bias as a trigger for overthinking

The authors identify that reasoning models form a preliminary guess (internal bias) upon encountering a problem, and when this guess conflicts with subsequent reasoning, the model engages in excessive reflection, causing overthinking and wasted computation.

Contribution

Counterfactual interventions demonstrating causal relationship

The authors propose two counterfactual validation methods: removing the input question after an answer is generated and manually injecting bias. These interventions demonstrate a causal link between internal bias and overthinking, with question removal reducing redundant reasoning by 31% to 53%.

Contribution

Attention-based mechanism explaining bias influence

The authors discover through interpretability analysis that models excessively attend to the input question when deciding whether to reflect further. This heightened attention reactivates internal bias, influencing the decision to engage in additional reasoning steps.