Teaching LLMs to Admit Uncertainty in OCR

ICLR 2026 Conference SubmissionAnonymous Authors
Optical Character RecognitionVisually Degraded DocumentUncertaintyLLM
Abstract:

Vision language models (VLMs) are increasingly replacing traditional OCR pipelines, but on visually degraded documents they often hallucinate, producing fluent yet incorrect text without signaling uncertainty. This occurs because current post-training emphasizes accuracy, which encourages models to guess even when uncertain. The problem persists in state-of-the-art systems and severely impacts OCR reliability. To improve the trustworthiness of OCR on degraded documents, we propose uncertainty-aware OCR. Rather than suppressing guesses, our model transcribes while explicitly bracketing spans it deems unreliable with uncertainty tags. To train our model, we use Group Relative Policy Optimization (GRPO). We define the usage rules for uncertainty tags and an evaluation protocol. We introduce a pseudo-labeled cold start and a multi-objective reward that balances transcription accuracy and uncertainty coverage while preventing reward hacking. We explore different combinations of cold start and reward granularity and verify the effect of reward parameters in preventing reward hacking and improving the corresponding metrics. We also introduce Blur-OCR, a challenging OCR benchmark for uncertainty-aware OCR on degraded documents. In detailed experiments, our model maintains transcription accuracy while achieving an uncertainty tag F1 score of 0.685, and substantially outperforms both open- and closed-source baselines.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes uncertainty-aware OCR that transcribes degraded documents while bracketing uncertain spans with explicit uncertainty tags, trained via Group Relative Policy Optimization. It resides in the 'Uncertainty-Aware OCR with Explicit Tagging' leaf, which currently contains only this paper as a singleton. This indicates a sparse research direction within the broader 'Uncertainty Quantification and Confidence Estimation in OCR' branch, which itself contains just three leaves across the entire taxonomy of fourteen papers. The work thus occupies a relatively unexplored niche within the field.

The taxonomy tree reveals neighboring leaves focused on entropy-based uncertainty measurement and confidence-based estimation, which compute scores or metrics rather than producing explicit markup. The broader taxonomy includes parallel branches for noise-robust learning, document restoration, and hallucination mitigation in multimodal models. The paper's approach diverges from these by embedding uncertainty signals directly into the transcription output rather than relying on preprocessing, post-hoc correction, or implicit scoring. Its use of GRPO and multi-objective rewards connects it to reinforcement learning strategies, a methodology not prominently represented in the surveyed taxonomy.

Among twenty-one candidates examined, no contribution was clearly refuted. For the uncertainty-aware paradigm with UNC tags, ten candidates were reviewed with zero refutable overlaps; the training method examined seven candidates with none refutable; and the Blur-OCR benchmark examined four candidates with none refutable. This suggests that within the limited semantic search scope, no prior work explicitly combines uncertainty tagging, GRPO-based training, and a dedicated degraded-OCR benchmark in the manner proposed. The statistics reflect a focused search rather than exhaustive coverage, so unexplored related work may exist beyond the top-K matches.

Based on the limited search scope of twenty-one semantically similar papers, the work appears to introduce a distinctive paradigm shift from implicit confidence scoring to explicit uncertainty markup in OCR outputs. The singleton taxonomy position and absence of refutable prior work among examined candidates suggest novelty, though the small search scale means the analysis cannot rule out related efforts in adjacent literatures or less semantically similar domains. The Blur-OCR benchmark and GRPO-based training further differentiate the contribution within the examined set.

Taxonomy

Core-task Taxonomy Papers
14
3
Claimed Contributions
21
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: uncertainty-aware optical character recognition on degraded documents. The field addresses the challenge of extracting text from documents where visual quality is compromised by noise, aging, or other degradation factors, while simultaneously quantifying the reliability of recognition outputs. The taxonomy reveals several complementary research directions. One major branch focuses on uncertainty quantification and confidence estimation, developing methods that explicitly tag or score OCR predictions to indicate reliability. Another branch emphasizes noise-robust learning and adaptation, training models to handle domain shifts and noisy labels. Document restoration and enhancement techniques form a third branch, preprocessing degraded inputs to improve downstream recognition. Additional branches address hallucination mitigation in multimodal models, specialized pipelines for historical and multilingual corpora, and multi-stage systems for complex handwritten forms. These branches reflect a spectrum from preprocessing-centric approaches to end-to-end learning strategies that internalize robustness. Recent work has explored diverse strategies for managing uncertainty and degradation. Some studies leverage large language models to estimate entropy or confidence scores for OCR outputs, as seen in GPT Entropy OCR[1], while others integrate confidence measures directly into training pipelines to handle dual noise sources, exemplified by Confidence Dual Noises[2]. Preprocessing methods like Text Super Resolution[3] and Student Essay Restoration[4] aim to recover legibility before recognition, whereas Mitigating OCR Hallucinations[5] tackles post-hoc correction of erroneous outputs. Teaching LLMs Uncertainty[0] sits within the uncertainty quantification branch, specifically focusing on explicit tagging of uncertain tokens. Its emphasis on teaching models to recognize and flag their own uncertainty aligns closely with confidence-aware approaches like Confidence OCR Error[6] and Neural Keyword Confidence[9], yet it distinguishes itself by leveraging large language model capabilities rather than traditional confidence scoring mechanisms. This positions the work at the intersection of modern generative models and classical OCR reliability concerns.

Claimed Contributions

Uncertainty-aware OCR paradigm with UNC tags

The authors propose a new OCR approach where vision language models explicitly bracket uncertain or potentially erroneous text spans with special uncertainty tags during transcription, rather than suppressing guesses or producing overconfident hallucinations on degraded documents.

10 retrieved papers
Training method combining pseudo-labeled cold start and GRPO

The authors develop a two-phase training pipeline that first uses pseudo-labeled supervision (where a base model's errors are automatically tagged) followed by reinforcement learning with GRPO and a composite reward function that jointly optimizes transcription accuracy and uncertainty tagging quality.

7 retrieved papers
Blur-OCR benchmark dataset

The authors create a new benchmark dataset consisting of 107,520 training images and 2,048 evaluation images with diverse synthetic degradations applied to document images, designed specifically to evaluate uncertainty-aware OCR systems on degraded documents.

4 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Uncertainty-aware OCR paradigm with UNC tags

The authors propose a new OCR approach where vision language models explicitly bracket uncertain or potentially erroneous text spans with special uncertainty tags during transcription, rather than suppressing guesses or producing overconfident hallucinations on degraded documents.

Contribution

Training method combining pseudo-labeled cold start and GRPO

The authors develop a two-phase training pipeline that first uses pseudo-labeled supervision (where a base model's errors are automatically tagged) followed by reinforcement learning with GRPO and a composite reward function that jointly optimizes transcription accuracy and uncertainty tagging quality.

Contribution

Blur-OCR benchmark dataset

The authors create a new benchmark dataset consisting of 107,520 training images and 2,048 evaluation images with diverse synthetic degradations applied to document images, designed specifically to evaluate uncertainty-aware OCR systems on degraded documents.

Teaching LLMs to Admit Uncertainty in OCR | Novelty Validation