Rewarding Doubt: A Reinforcement Learning Approach to Calibrated Confidence Expression of Large Language Models

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 5.7 Download Report PDF

Confidence CalibrationUncertainty EstimationLarge Language Models

A safe and trustworthy use of Large Language Models (LLMs) requires an accurate expression of confidence in their answers. We propose a novel Reinforcement Learning approach that allows to directly fine-tune LLMs to express calibrated confidence estimates alongside their answers to factual questions. Our method optimizes a reward based on the logarithmic scoring rule, explicitly penalizing both over- and under-confidence. This encourages the model to align its confidence estimates with the actual predictive accuracy. The optimal policy under our reward design would result in perfectly calibrated confidence expressions. Unlike prior approaches that decouple confidence estimation from response generation, our method integrates confidence calibration seamlessly into the generative process of the LLM. Empirically, we demonstrate that models trained with our approach exhibit substantially improved calibration and generalize to unseen tasks without further fine-tuning, suggesting the emergence of general confidence awareness. We provide our training and evaluation code in the supplementary and will make it publicly available upon acceptance.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes a reinforcement learning method to fine-tune LLMs for calibrated confidence expression using a logarithmic scoring rule reward. It resides in the 'Reinforcement Learning-Based Calibration' leaf, which contains only two papers total: the original work and one sibling (Rewarding Doubt RL). This represents a sparse research direction within the broader taxonomy of 50 papers across 36 topics, suggesting that RL-based calibration remains relatively underexplored compared to supervised fine-tuning or post-hoc methods.

The taxonomy reveals neighboring approaches in sibling leaves: 'Fine-Tuning for Calibration' (four papers using supervised methods), 'Post-Hoc Calibration Methods' (three papers applying rescaling without parameter updates), and 'Prompting-Based Calibration' (three papers using strategic prompt design). The paper diverges from these by integrating confidence calibration directly into the generative process via RL rather than decoupling estimation from generation. Its position under 'Calibration Techniques and Training Methods' distinguishes it from elicitation-focused work in other branches that extract confidence without training interventions.

Among 30 candidates examined, the core RL approach shows substantial prior work: five of ten candidates can refute Contribution A. The logarithmic scoring rule reward (Contribution B) appears more distinctive, with only one refutable candidate among ten examined. Generalization to unseen tasks (Contribution C) faces moderate overlap, with four of ten candidates providing relevant prior work. The limited search scope means these statistics reflect top-K semantic matches and citation expansion, not exhaustive coverage of the field.

The analysis suggests mixed novelty: the RL framing has precedent in the sparse two-paper leaf, while the specific reward design and generalization claims show less direct overlap among examined candidates. The small leaf size indicates an emerging research direction, though the presence of a closely related sibling paper and multiple refutable candidates for the core contribution tempers claims of fundamental novelty within the limited search scope.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Calibrated confidence expression in large language model responses. The field addresses how to ensure that when LLMs express uncertainty or confidence, these expressions align with actual correctness rates. The taxonomy organizes work into several major branches: Confidence Elicitation Methods explore how to extract uncertainty signals from models (e.g., verbalized probabilities as in Verbalized Confidence Scores[25] or consistency-based measures); Calibration Techniques and Training Methods develop approaches to improve alignment between expressed and true confidence, including supervised fine-tuning (Simple Supervised Uncertainty[12]), reinforcement learning strategies (Rewarding Doubt RL[34]), and post-hoc adjustments (Calibration Tuning[14]); Evaluation and Benchmarking establish metrics and datasets to measure calibration quality across domains like question answering (QA Calibration[47]) and clinical settings (Clinical Confidence Benchmarking[32]); Theoretical Foundations examine the mathematical underpinnings and failure modes; Specialized Applications extend calibration to retrieval-augmented generation (RAG Uncertainty[33]) or biomedical contexts (Biomedical Calibration[20]); and Foundational Uncertainty Quantification provides broader surveys and conceptual frameworks (Uncertainty Quantification Survey[1]). Recent work reveals contrasting philosophies: some methods rely on eliciting natural language confidence statements and then calibrating them through training (Express Uncertainty[3], Just Ask Calibration[6]), while others leverage internal model signals like token probabilities or sampling consistency (Self-Consistency Confidence[29]). A particularly active line uses reinforcement learning to directly reward well-calibrated doubt, as seen in Rewarding Doubt[0] and its closely related predecessor Rewarding Doubt RL[34], which both frame calibration as an RL objective rather than a supervised regression problem. This RL-based approach contrasts with simpler supervised methods (Simple Supervised Uncertainty[12]) that train on labeled correctness data, and with post-hoc recalibration techniques (Calibration Tuning[14]) that adjust outputs without retraining. Rewarding Doubt[0] sits squarely within the reinforcement learning calibration cluster, emphasizing adaptive learning of doubt expressions through reward signals, distinguishing it from static elicitation or one-shot tuning strategies prevalent elsewhere in the taxonomy.

Claimed Contributions

Reinforcement learning approach for calibrated confidence expression in LLMs

Can Refute

10 retrieved papers

The authors introduce a reinforcement learning method that trains large language models to generate calibrated numerical confidence scores together with their answers. Unlike prior work that decouples confidence estimation from generation, this approach integrates confidence calibration seamlessly into the LLM's generative process.

10 retrieved papers

Can Refute

Reward function based on logarithmic scoring rule

Can Refute

10 retrieved papers

The authors design a reward function using the logarithmic scoring rule that penalizes overconfidence and underconfidence. This proper scoring rule ensures that the optimal policy results in perfectly calibrated confidence expressions, as the expected reward is maximized when predicted confidence equals the true epistemic probability.

10 retrieved papers

Can Refute

Generalization of confidence awareness to unseen tasks

Can Refute

10 retrieved papers

The authors demonstrate that their method enables models to generalize their learned confidence calibration abilities to out-of-domain datasets without additional fine-tuning. This suggests the model develops a general awareness of its own uncertainty rather than task-specific calibration patterns.

10 retrieved papers

Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[34] Rewarding Doubt: A Reinforcement Learning Approach to Confidence Calibration of Large Language Models PDF

Bani-Harouni, David, Paul Stangel, Pellegrini, Chantal, David Bani-Harouni, Ãzsoy, Ege, Chantal Pellegrini, Zaripova, Kamilia, Ege Ãzsoy, Keicher, Matthias, Kamilia Zaripova, Navab, Nassir, Matthias Keicher, N. Navab (2025)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Reinforcement learning approach for calibrated confidence expression in LLMs

[62] Reinforcement Learning for Better Verbalized Confidence in Long-Form Generation PDF

Can Refute

[64] Ualign: Leveraging uncertainty estimations for factuality alignment on large language models PDF

Cannot Refute

[60] On eliciting beliefs in strategic games PDF

Cannot Refute

Contribution

Generalization of confidence awareness to unseen tasks

Rewarding Doubt: A Reinforcement Learning Approach to Calibrated Confidence Expression of Large Language Models

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[34] Rewarding Doubt: A Reinforcement Learning Approach to Confidence Calibration of Large Language Models PDF

Contribution Analysis

Reinforcement learning approach for calibrated confidence expression in LLMs

[62] Reinforcement Learning for Better Verbalized Confidence in Long-Form Generation PDF

[64] Ualign: Leveraging uncertainty estimations for factuality alignment on large language models PDF

[65] Linguistic Calibration of Language Models PDF

[66] When to trust llms: Aligning confidence with response quality PDF

[68] Taming overconfidence in llms: Reward calibration in rlhf PDF

[6] Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback PDF

[34] Rewarding Doubt: A Reinforcement Learning Approach to Confidence Calibration of Large Language Models PDF

[61] Prompt4Trust: A Reinforcement Learning Prompt Augmentation Framework for Clinically-Aligned Confidence Calibration in Multimodal Large Language Models PDF

[63] Guiding Reinforcement Learning Using Uncertainty-Aware Large Language Models PDF

[67] Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models PDF

Reward function based on logarithmic scoring rule

[55] Strictly proper scoring rules, prediction, and estimation PDF

[51] Evaluating System Responses Based On Overconfidence and Underconfidence PDF

[52] Recalibrating probabilistic forecasts of epidemics PDF

[53] Proper scoring loss functions are simple and effective for uncertainty quantification of white matter hyperintensities PDF

[54] Non-parametric bayesian isotonic calibration: Fighting over-confidence in binary classification PDF

[56] Spline-based probability calibration PDF

[57] Is it better to average probabilities or quantiles? PDF

[58] Measuring and adjusting for overconfidence PDF

[59] Simultaneous over-and underconfidence: The role of error in judgment processes. PDF

[60] On eliciting beliefs in strategic games PDF

Generalization of confidence awareness to unseen tasks

[64] Ualign: Leveraging uncertainty estimations for factuality alignment on large language models PDF

[65] Linguistic Calibration of Language Models PDF

[70] Thermometer: Towards universal calibration for large language models PDF

[73] Improving Metacognition and Uncertainty Communication in Language Models PDF

[2] Graph-based confidence calibration for large language models PDF

[4] Confident adaptive language modeling PDF

[34] Rewarding Doubt: A Reinforcement Learning Approach to Confidence Calibration of Large Language Models PDF

[69] Test-time low rank adaptation via confidence maximization for zero-shot generalization of vision-language models PDF

[71] Few-shot recalibration of language models PDF

[72] A close look into the calibration of pre-trained language models PDF

Table of Contents