Efficient Reasoning with Balanced Thinking

ICLR 2026 Conference SubmissionAnonymous Authors
Large Reasoning ModelsEfficient Reasoning
Abstract:

Large Reasoning Models (LRMs) have shown remarkable reasoning capabilities, yet they often suffer from overthinking, expending redundant computational steps on simple problems, or underthinking, failing to explore sufficient reasoning paths despite inherent capabilities. These issues lead to inefficiencies and potential inaccuracies, limiting practical deployment in resource-constrained settings. Existing methods to mitigate overthinking, such as suppressing reflective keywords or adjusting reasoning length, may inadvertently induce underthinking, compromising accuracy. Therefore, we propose \textsc{ReBalance}, a training-free framework that achieves efficient reasoning with balanced thinking. \textsc{ReBalance} leverages confidence as a continuous indicator of reasoning dynamics, identifying overthinking through high confidence variance and underthinking via consistent overconfidence. By aggregating hidden states from a small-scale dataset into reasoning mode prototypes, we compute a steering vector to guide LRMs’ reasoning trajectories. A dynamic control function modulates this vector’s strength and direction based on real-time confidence, pruning redundancy during overthinking, and promoting exploration during underthinking. Extensive experiments conducted on four models ranging from 0.5B to 32B, and across nine benchmarks in math reasoning, general question answering, and coding tasks demonstrate that \textsc{ReBalance} effectively reduces output redundancy while improving accuracy, offering a general, training-free, and plug-and-play strategy for efficient and robust LRM deployment. Code and models will be made publicly available.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes ReBalance, a training-free framework that uses confidence signals to detect and mitigate both overthinking and underthinking in large reasoning models. It resides in the 'Confidence-Based Reasoning Control' leaf under 'Adaptive Reasoning Control and Difficulty-Aware Methods', which contains only two papers including this one. This is a relatively sparse research direction within a broader taxonomy of 50 papers across 22 leaf nodes, suggesting the specific approach of using confidence variance and overconfidence patterns for dual-sided reasoning control is not yet heavily explored.

The taxonomy reveals that ReBalance sits within a larger ecosystem of adaptive reasoning methods. Its parent branch includes 'Difficulty-Adaptive Reasoning Frameworks' (3 papers) and 'Dynamic Early Exit and Truncation Mechanisms' (2 papers), which address similar efficiency goals through different signals—explicit difficulty modeling or self-termination criteria. Neighboring branches include 'Reinforcement Learning and Training-Based Optimization' (5 papers) and 'Post-Hoc Reasoning Refinement' (3 papers), which tackle reasoning efficiency through training interventions or post-generation filtering rather than inference-time steering. The taxonomy's scope notes clarify that confidence-based methods like ReBalance are distinguished from difficulty-aware approaches by their reliance on model uncertainty rather than upfront problem characterization.

Among 30 candidates examined, the contribution-level analysis shows mixed novelty signals. 'Confidence as a continuous indicator' examined 10 candidates with 1 refutable match, suggesting some prior work uses confidence for reasoning control but perhaps not in the dual-detection manner proposed here. The 'ReBalance framework' contribution also examined 10 candidates with 1 refutable match, indicating the specific steering vector approach may have precedent. The 'plug-and-play solution' contribution examined 10 candidates with 2 refutable matches, suggesting training-free efficiency improvements are more established. These statistics reflect a limited search scope, not exhaustive coverage of the field.

Given the sparse population of the confidence-based control leaf and the limited search scale, the work appears to occupy a relatively underexplored niche within adaptive reasoning methods. The analysis captures top-30 semantic matches and does not guarantee comprehensive coverage of all relevant prior work, particularly in adjacent areas like difficulty-aware frameworks or latent reasoning representations. The dual focus on both overthinking and underthinking through confidence dynamics may differentiate this work from single-sided approaches, though the limited refutable matches suggest some conceptual overlap exists within the examined candidates.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
30
Contribution Candidate Papers Compared
4
Refutable Paper

Research Landscape Overview

Core task: balancing overthinking and underthinking in large reasoning models. The field has organized itself around several complementary perspectives. One major branch focuses on characterizing and analyzing reasoning inefficiencies—understanding when models generate excessive or insufficient reasoning steps and why these patterns emerge. Another set of approaches centers on adaptive reasoning control and difficulty-aware methods, which dynamically adjust computational effort based on problem characteristics; works like DAST[3] and Reasoning Strength Planning[5] exemplify this direction. Reinforcement learning and training-based optimization form a third pillar, aiming to teach models efficient reasoning habits during training. Meanwhile, latent and compressed reasoning representations explore whether reasoning can be internalized or condensed, and post-hoc refinement methods attempt to optimize reasoning traces after generation. System-level efficiency and serving optimization address deployment concerns, while evaluation frameworks and benchmarking provide standardized testbeds for comparing approaches. Security and adversarial aspects examine vulnerabilities introduced by reasoning processes, and specialized applications demonstrate domain-specific tuning. Within the adaptive control branch, a particularly active line of work uses confidence signals to decide when to stop reasoning. Balanced Thinking[0] sits squarely in this confidence-based reasoning control cluster, alongside Meta-Cognitive Reasoning[25], which similarly leverages model self-assessment to regulate thinking depth. These methods contrast with difficulty-aware approaches like DAST[3], which predict problem hardness upfront rather than monitoring confidence during generation. The central tension across these branches involves trade-offs between accuracy, efficiency, and robustness: some methods prioritize minimizing wasted computation on easy problems, while others focus on ensuring sufficient reasoning for hard cases. Balanced Thinking[0] addresses this by using confidence thresholds to terminate reasoning adaptively, positioning itself as a middle ground that responds to the model's own uncertainty rather than relying solely on external difficulty estimates or fixed budgets.

Claimed Contributions

Confidence as a continuous indicator of reasoning dynamics

The authors demonstrate that stepwise confidence and confidence variance can reliably indicate when large reasoning models exhibit overthinking (high variance) or underthinking (persistent overconfidence), providing a foundation for dynamic control of reasoning behavior.

10 retrieved papers
Can Refute
REBALANCE framework for dynamic reasoning control

The authors introduce REBALANCE, a training-free method that extracts steering vectors from hidden states and applies a dynamic control function to adjust reasoning trajectories in real-time based on confidence levels, balancing between overthinking and underthinking without requiring additional training.

10 retrieved papers
Can Refute
Plug-and-play solution improving efficiency and accuracy

The authors validate that REBALANCE simultaneously reduces reasoning length and improves accuracy across multiple models (0.5B to 32B parameters) and nine benchmarks spanning math, science, commonsense, and coding tasks, providing a general and practical deployment strategy.

10 retrieved papers
Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Confidence as a continuous indicator of reasoning dynamics

The authors demonstrate that stepwise confidence and confidence variance can reliably indicate when large reasoning models exhibit overthinking (high variance) or underthinking (persistent overconfidence), providing a foundation for dynamic control of reasoning behavior.

Contribution

REBALANCE framework for dynamic reasoning control

The authors introduce REBALANCE, a training-free method that extracts steering vectors from hidden states and applies a dynamic control function to adjust reasoning trajectories in real-time based on confidence levels, balancing between overthinking and underthinking without requiring additional training.

Contribution

Plug-and-play solution improving efficiency and accuracy

The authors validate that REBALANCE simultaneously reduces reasoning length and improves accuracy across multiple models (0.5B to 32B parameters) and nine benchmarks spanning math, science, commonsense, and coding tasks, providing a general and practical deployment strategy.

Efficient Reasoning with Balanced Thinking | Novelty Validation