Efficient Reasoning with Balanced Thinking

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 7.0 Download Report PDF

Large Reasoning ModelsEfficient Reasoning

Large Reasoning Models (LRMs) have shown remarkable reasoning capabilities, yet they often suffer from overthinking, expending redundant computational steps on simple problems, or underthinking, failing to explore sufficient reasoning paths despite inherent capabilities. These issues lead to inefficiencies and potential inaccuracies, limiting practical deployment in resource-constrained settings. Existing methods to mitigate overthinking, such as suppressing reflective keywords or adjusting reasoning length, may inadvertently induce underthinking, compromising accuracy. Therefore, we propose \textsc{ReBalance}, a training-free framework that achieves efficient reasoning with balanced thinking. \textsc{ReBalance} leverages confidence as a continuous indicator of reasoning dynamics, identifying overthinking through high confidence variance and underthinking via consistent overconfidence. By aggregating hidden states from a small-scale dataset into reasoning mode prototypes, we compute a steering vector to guide LRMs’ reasoning trajectories. A dynamic control function modulates this vector’s strength and direction based on real-time confidence, pruning redundancy during overthinking, and promoting exploration during underthinking. Extensive experiments conducted on four models ranging from 0.5B to 32B, and across nine benchmarks in math reasoning, general question answering, and coding tasks demonstrate that \textsc{ReBalance} effectively reduces output redundancy while improving accuracy, offering a general, training-free, and plug-and-play strategy for efficient and robust LRM deployment. Code and models will be made publicly available.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes ReBalance, a training-free framework that uses confidence signals to detect and mitigate both overthinking and underthinking in large reasoning models. It resides in the 'Confidence-Based Reasoning Control' leaf under 'Adaptive Reasoning Control and Difficulty-Aware Methods', which contains only two papers including this one. This is a relatively sparse research direction within a broader taxonomy of 50 papers across 22 leaf nodes, suggesting the specific approach of using confidence variance and overconfidence patterns for dual-sided reasoning control is not yet heavily explored.

The taxonomy reveals that ReBalance sits within a larger ecosystem of adaptive reasoning methods. Its parent branch includes 'Difficulty-Adaptive Reasoning Frameworks' (3 papers) and 'Dynamic Early Exit and Truncation Mechanisms' (2 papers), which address similar efficiency goals through different signals—explicit difficulty modeling or self-termination criteria. Neighboring branches include 'Reinforcement Learning and Training-Based Optimization' (5 papers) and 'Post-Hoc Reasoning Refinement' (3 papers), which tackle reasoning efficiency through training interventions or post-generation filtering rather than inference-time steering. The taxonomy's scope notes clarify that confidence-based methods like ReBalance are distinguished from difficulty-aware approaches by their reliance on model uncertainty rather than upfront problem characterization.

Among 30 candidates examined, the contribution-level analysis shows mixed novelty signals. 'Confidence as a continuous indicator' examined 10 candidates with 1 refutable match, suggesting some prior work uses confidence for reasoning control but perhaps not in the dual-detection manner proposed here. The 'ReBalance framework' contribution also examined 10 candidates with 1 refutable match, indicating the specific steering vector approach may have precedent. The 'plug-and-play solution' contribution examined 10 candidates with 2 refutable matches, suggesting training-free efficiency improvements are more established. These statistics reflect a limited search scope, not exhaustive coverage of the field.

Given the sparse population of the confidence-based control leaf and the limited search scale, the work appears to occupy a relatively underexplored niche within adaptive reasoning methods. The analysis captures top-30 semantic matches and does not guarantee comprehensive coverage of all relevant prior work, particularly in adjacent areas like difficulty-aware frameworks or latent reasoning representations. The dual focus on both overthinking and underthinking through confidence dynamics may differentiate this work from single-sided approaches, though the limited refutable matches suggest some conceptual overlap exists within the examined candidates.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: balancing overthinking and underthinking in large reasoning models. The field has organized itself around several complementary perspectives. One major branch focuses on characterizing and analyzing reasoning inefficiencies—understanding when models generate excessive or insufficient reasoning steps and why these patterns emerge. Another set of approaches centers on adaptive reasoning control and difficulty-aware methods, which dynamically adjust computational effort based on problem characteristics; works like DAST[3] and Reasoning Strength Planning[5] exemplify this direction. Reinforcement learning and training-based optimization form a third pillar, aiming to teach models efficient reasoning habits during training. Meanwhile, latent and compressed reasoning representations explore whether reasoning can be internalized or condensed, and post-hoc refinement methods attempt to optimize reasoning traces after generation. System-level efficiency and serving optimization address deployment concerns, while evaluation frameworks and benchmarking provide standardized testbeds for comparing approaches. Security and adversarial aspects examine vulnerabilities introduced by reasoning processes, and specialized applications demonstrate domain-specific tuning. Within the adaptive control branch, a particularly active line of work uses confidence signals to decide when to stop reasoning. Balanced Thinking[0] sits squarely in this confidence-based reasoning control cluster, alongside Meta-Cognitive Reasoning[25], which similarly leverages model self-assessment to regulate thinking depth. These methods contrast with difficulty-aware approaches like DAST[3], which predict problem hardness upfront rather than monitoring confidence during generation. The central tension across these branches involves trade-offs between accuracy, efficiency, and robustness: some methods prioritize minimizing wasted computation on easy problems, while others focus on ensuring sufficient reasoning for hard cases. Balanced Thinking[0] addresses this by using confidence thresholds to terminate reasoning adaptively, positioning itself as a middle ground that responds to the model's own uncertainty rather than relying solely on external difficulty estimates or fixed budgets.

Claimed Contributions

Confidence as a continuous indicator of reasoning dynamics

Can Refute

10 retrieved papers

The authors demonstrate that stepwise confidence and confidence variance can reliably indicate when large reasoning models exhibit overthinking (high variance) or underthinking (persistent overconfidence), providing a foundation for dynamic control of reasoning behavior.

10 retrieved papers

Can Refute

REBALANCE framework for dynamic reasoning control

Can Refute

10 retrieved papers

The authors introduce REBALANCE, a training-free method that extracts steering vectors from hidden states and applies a dynamic control function to adjust reasoning trajectories in real-time based on confidence levels, balancing between overthinking and underthinking without requiring additional training.

10 retrieved papers

Can Refute

Plug-and-play solution improving efficiency and accuracy

Can Refute

10 retrieved papers

The authors validate that REBALANCE simultaneously reduces reasoning length and improves accuracy across multiple models (0.5B to 32B parameters) and nine benchmarks spanning math, science, commonsense, and coding tasks, providing a general and practical deployment strategy.

10 retrieved papers

Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[25] From "Aha Moments" to Controllable Thinking: Toward Meta-Cognitive Reasoning in Large Reasoning Models via Decoupled Reasoning and Control PDF

Ha Rui, Li, Chaozhuo, Pu Rui, Su Sen (2025) • arXiv.org

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Confidence as a continuous indicator of reasoning dynamics

[69] Concise: Confidence-guided compression in step-by-step efficient reasoning PDF

Can Refute

[1] Stop overthinking: A survey on efficient reasoning for large language models PDF

Cannot Refute

[2] Towards reasoning era: A survey of long chain-of-thought for reasoning large language models PDF

Cannot Refute

[5] On Reasoning Strength Planning in Large Reasoning Models PDF

Cannot Refute

[24] Let LLMs Break Free from Overthinking via Self-Braking Tuning PDF

Cannot Refute

[33] Innate Reasoning is Not Enough: In-Context Learning Enhances Reasoning Large Language Models with Less Overthinking PDF

Cannot Refute

[68] Alignment for efficient tool calling of large language models PDF

Cannot Refute

[70] Reasoning about Uncertainty: Do Reasoning Models Know When They Don't Know? PDF

Cannot Refute

[71] Lexical Hints of Accuracy in LLM Reasoning Chains PDF

Cannot Refute

[72] m1: Unleash the Potential of Test-Time Scaling for Medical Reasoning with Large Language Models PDF

Cannot Refute

Contribution

REBALANCE framework for dynamic reasoning control

[53] SEAL: Steerable Reasoning Calibration of Large Language Models for Free PDF

Can Refute

[35] Efficient inference for large reasoning models: A survey PDF

Cannot Refute

[41] Reasoning Models Know When They're Right: Probing Hidden States for Self-Verification PDF

Cannot Refute

[51] Deep think with confidence PDF

Cannot Refute

[52] Improving reasoning performance in large language models via representation engineering PDF

Cannot Refute

[54] Confident adaptive language modeling PDF

Cannot Refute

[55] Fractional Reasoning via Latent Steering Vectors Improves Inference Time Compute PDF

Cannot Refute

[56] Controlling Thinking Speed in Reasoning Models PDF

Cannot Refute

[57] Calibrating reasoning in language models with internal consistency PDF

Cannot Refute

[58] Why is spatial reasoning hard for vlms? an attention mechanism perspective on focus areas PDF

Cannot Refute

Contribution

Plug-and-play solution improving efficiency and accuracy

[23] Exploring and Exploiting the Inherent Efficiency within Large Reasoning Models for Self-Guided Efficiency Enhancement PDF

Can Refute

[61] Adaptthink: Reasoning models can learn when to think PDF

Can Refute

[59] Two Experts Are All You Need for Steering Thinking: Reinforcing Cognitive Effort in MoE Reasoning Models Without Additional Training PDF

Cannot Refute

[60] InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency PDF

Cannot Refute

[62] Logit arithmetic elicits long reasoning capabilities without training PDF

Cannot Refute

[63] Mindstar: Enhancing math reasoning in pre-trained llms at inference time PDF

Cannot Refute

[64] Advancing language model reasoning through reinforcement learning and inference scaling PDF

Cannot Refute

[65] LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy Optimization PDF

Cannot Refute

[66] Thinking slow, fast: Scaling inference compute with distilled reasoners PDF

Cannot Refute

[67] Specreason: Fast and accurate inference-time compute via speculative reasoning PDF

Cannot Refute

Efficient Reasoning with Balanced Thinking

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[25] From "Aha Moments" to Controllable Thinking: Toward Meta-Cognitive Reasoning in Large Reasoning Models via Decoupled Reasoning and Control PDF

Contribution Analysis

Confidence as a continuous indicator of reasoning dynamics

[69] Concise: Confidence-guided compression in step-by-step efficient reasoning PDF

[1] Stop overthinking: A survey on efficient reasoning for large language models PDF

[2] Towards reasoning era: A survey of long chain-of-thought for reasoning large language models PDF

[5] On Reasoning Strength Planning in Large Reasoning Models PDF

[24] Let LLMs Break Free from Overthinking via Self-Braking Tuning PDF

[33] Innate Reasoning is Not Enough: In-Context Learning Enhances Reasoning Large Language Models with Less Overthinking PDF

[68] Alignment for efficient tool calling of large language models PDF

[70] Reasoning about Uncertainty: Do Reasoning Models Know When They Don't Know? PDF

[71] Lexical Hints of Accuracy in LLM Reasoning Chains PDF

[72] m1: Unleash the Potential of Test-Time Scaling for Medical Reasoning with Large Language Models PDF

REBALANCE framework for dynamic reasoning control

[53] SEAL: Steerable Reasoning Calibration of Large Language Models for Free PDF

[35] Efficient inference for large reasoning models: A survey PDF

[41] Reasoning Models Know When They're Right: Probing Hidden States for Self-Verification PDF

[51] Deep think with confidence PDF

[52] Improving reasoning performance in large language models via representation engineering PDF

[54] Confident adaptive language modeling PDF

[55] Fractional Reasoning via Latent Steering Vectors Improves Inference Time Compute PDF

[56] Controlling Thinking Speed in Reasoning Models PDF

[57] Calibrating reasoning in language models with internal consistency PDF

[58] Why is spatial reasoning hard for vlms? an attention mechanism perspective on focus areas PDF

Plug-and-play solution improving efficiency and accuracy

[23] Exploring and Exploiting the Inherent Efficiency within Large Reasoning Models for Self-Guided Efficiency Enhancement PDF

[61] Adaptthink: Reasoning models can learn when to think PDF

[59] Two Experts Are All You Need for Steering Thinking: Reinforcing Cognitive Effort in MoE Reasoning Models Without Additional Training PDF

[60] InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency PDF

[62] Logit arithmetic elicits long reasoning capabilities without training PDF

[63] Mindstar: Enhancing math reasoning in pre-trained llms at inference time PDF

[64] Advancing language model reasoning through reinforcement learning and inference scaling PDF

[65] LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy Optimization PDF

[66] Thinking slow, fast: Scaling inference compute with distilled reasoners PDF

[67] Specreason: Fast and accurate inference-time compute via speculative reasoning PDF

Table of Contents