Teaching LLMs to Admit Uncertainty in OCR

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

Optical Character RecognitionVisually Degraded DocumentUncertaintyLLM

Vision language models (VLMs) are increasingly replacing traditional OCR pipelines, but on visually degraded documents they often hallucinate, producing fluent yet incorrect text without signaling uncertainty. This occurs because current post-training emphasizes accuracy, which encourages models to guess even when uncertain. The problem persists in state-of-the-art systems and severely impacts OCR reliability. To improve the trustworthiness of OCR on degraded documents, we propose uncertainty-aware OCR. Rather than suppressing guesses, our model transcribes while explicitly bracketing spans it deems unreliable with uncertainty tags. To train our model, we use Group Relative Policy Optimization (GRPO). We define the usage rules for uncertainty tags and an evaluation protocol. We introduce a pseudo-labeled cold start and a multi-objective reward that balances transcription accuracy and uncertainty coverage while preventing reward hacking. We explore different combinations of cold start and reward granularity and verify the effect of reward parameters in preventing reward hacking and improving the corresponding metrics. We also introduce Blur-OCR, a challenging OCR benchmark for uncertainty-aware OCR on degraded documents. In detailed experiments, our model maintains transcription accuracy while achieving an uncertainty tag F1 score of 0.685, and substantially outperforms both open- and closed-source baselines.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes uncertainty-aware OCR that transcribes degraded documents while bracketing uncertain spans with explicit uncertainty tags, trained via Group Relative Policy Optimization. It resides in the 'Uncertainty-Aware OCR with Explicit Tagging' leaf, which currently contains only this paper as a singleton. This indicates a sparse research direction within the broader 'Uncertainty Quantification and Confidence Estimation in OCR' branch, which itself contains just three leaves across the entire taxonomy of fourteen papers. The work thus occupies a relatively unexplored niche within the field.

The taxonomy tree reveals neighboring leaves focused on entropy-based uncertainty measurement and confidence-based estimation, which compute scores or metrics rather than producing explicit markup. The broader taxonomy includes parallel branches for noise-robust learning, document restoration, and hallucination mitigation in multimodal models. The paper's approach diverges from these by embedding uncertainty signals directly into the transcription output rather than relying on preprocessing, post-hoc correction, or implicit scoring. Its use of GRPO and multi-objective rewards connects it to reinforcement learning strategies, a methodology not prominently represented in the surveyed taxonomy.

Among twenty-one candidates examined, no contribution was clearly refuted. For the uncertainty-aware paradigm with UNC tags, ten candidates were reviewed with zero refutable overlaps; the training method examined seven candidates with none refutable; and the Blur-OCR benchmark examined four candidates with none refutable. This suggests that within the limited semantic search scope, no prior work explicitly combines uncertainty tagging, GRPO-based training, and a dedicated degraded-OCR benchmark in the manner proposed. The statistics reflect a focused search rather than exhaustive coverage, so unexplored related work may exist beyond the top-K matches.

Based on the limited search scope of twenty-one semantically similar papers, the work appears to introduce a distinctive paradigm shift from implicit confidence scoring to explicit uncertainty markup in OCR outputs. The singleton taxonomy position and absence of refutable prior work among examined candidates suggest novelty, though the small search scale means the analysis cannot rule out related efforts in adjacent literatures or less semantically similar domains. The Blur-OCR benchmark and GRPO-based training further differentiate the contribution within the examined set.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: uncertainty-aware optical character recognition on degraded documents. The field addresses the challenge of extracting text from documents where visual quality is compromised by noise, aging, or other degradation factors, while simultaneously quantifying the reliability of recognition outputs. The taxonomy reveals several complementary research directions. One major branch focuses on uncertainty quantification and confidence estimation, developing methods that explicitly tag or score OCR predictions to indicate reliability. Another branch emphasizes noise-robust learning and adaptation, training models to handle domain shifts and noisy labels. Document restoration and enhancement techniques form a third branch, preprocessing degraded inputs to improve downstream recognition. Additional branches address hallucination mitigation in multimodal models, specialized pipelines for historical and multilingual corpora, and multi-stage systems for complex handwritten forms. These branches reflect a spectrum from preprocessing-centric approaches to end-to-end learning strategies that internalize robustness. Recent work has explored diverse strategies for managing uncertainty and degradation. Some studies leverage large language models to estimate entropy or confidence scores for OCR outputs, as seen in GPT Entropy OCR[1], while others integrate confidence measures directly into training pipelines to handle dual noise sources, exemplified by Confidence Dual Noises[2]. Preprocessing methods like Text Super Resolution[3] and Student Essay Restoration[4] aim to recover legibility before recognition, whereas Mitigating OCR Hallucinations[5] tackles post-hoc correction of erroneous outputs. Teaching LLMs Uncertainty[0] sits within the uncertainty quantification branch, specifically focusing on explicit tagging of uncertain tokens. Its emphasis on teaching models to recognize and flag their own uncertainty aligns closely with confidence-aware approaches like Confidence OCR Error[6] and Neural Keyword Confidence[9], yet it distinguishes itself by leveraging large language model capabilities rather than traditional confidence scoring mechanisms. This positions the work at the intersection of modern generative models and classical OCR reliability concerns.

Claimed Contributions

Uncertainty-aware OCR paradigm with UNC tags

10 retrieved papers

The authors propose a new OCR approach where vision language models explicitly bracket uncertain or potentially erroneous text spans with special uncertainty tags during transcription, rather than suppressing guesses or producing overconfident hallucinations on degraded documents.

10 retrieved papers

Training method combining pseudo-labeled cold start and GRPO

7 retrieved papers

The authors develop a two-phase training pipeline that first uses pseudo-labeled supervision (where a base model's errors are automatically tagged) followed by reinforcement learning with GRPO and a composite reward function that jointly optimizes transcription accuracy and uncertainty tagging quality.

7 retrieved papers

Blur-OCR benchmark dataset

4 retrieved papers

The authors create a new benchmark dataset consisting of 107,520 training images and 2,048 evaluation images with diverse synthetic degradations applied to document images, designed specifically to evaluate uncertainty-aware OCR systems on degraded documents.

4 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Uncertainty-aware OCR paradigm with UNC tags

[6] Confidence-Aware Document OCR Error PDF

Cannot Refute

[21] Synergizing optical character recognition: A comparative analysis and integration of tesseract keras paddle and azure ocr PDF

Cannot Refute

[22] Mind the Gap: Analyzing Lacunae with Transformer-Based Transcription PDF

Cannot Refute

[23] ICDAR 2019 competition on post-OCR text correction PDF

Cannot Refute

[24] Prep-OCR: A complete pipeline for document image restoration and enhanced OCR accuracy PDF

Cannot Refute

[25] Confidence-Aware Document OCR Error Detection PDF

Cannot Refute

[26] Text detection and post-OCR correction in engineering documents PDF

Cannot Refute

[27] Entropy Heat-Mapping: Localizing GPT-Based OCR Errors with Sliding-Window Shannon Analysis PDF

Cannot Refute

[28] OCR post-correction for detecting adversarial text images PDF

Cannot Refute

[29] Text error correction after text recognition based on MacBERT4CSC PDF

Cannot Refute

Contribution

Training method combining pseudo-labeled cold start and GRPO

[8] Noisy-Aware Unsupervised Domain Adaptation for Scene Text Recognition PDF

Cannot Refute

[15] Seq-ups: Sequential uncertainty-aware pseudo-label selection for semi-supervised text recognition PDF

Cannot Refute

[16] Boosting semi-supervised scene text recognition via viewing and summarizing PDF

Cannot Refute

[17] Adaptive Conformal Guidance for Learning under Uncertainty PDF

Cannot Refute

[18] Watch and Act: Multi-orientation Open-Set Scene Text Recognition via Dynamic Expert Routing PDF

Cannot Refute

[19] Lightweight Reinforcement Graph Learning Framework for Low Resource Sensitive Text Classification PDF

Cannot Refute

[20] Learning Beyond Labels: Self-Supervised Handwritten Text Recognition PDF

Cannot Refute

Contribution

Blur-OCR benchmark dataset

[5] Seeing is Believing? Mitigating OCR Hallucinations in Multimodal Large Language Models PDF

Cannot Refute

[30] Rethinking OCR Evaluation for Information Extraction in Business Documents PDF

Cannot Refute

[31] Rethinking OCR Evaluation PDF

Cannot Refute

[32] Reading Between the Blurs: A Comparative Analysis of Motion Deblurring Methods for Vehicle License Plate OCR PDF

Cannot Refute

Teaching LLMs to Admit Uncertainty in OCR

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

Contribution Analysis

Uncertainty-aware OCR paradigm with UNC tags

[6] Confidence-Aware Document OCR Error PDF

[21] Synergizing optical character recognition: A comparative analysis and integration of tesseract keras paddle and azure ocr PDF

[22] Mind the Gap: Analyzing Lacunae with Transformer-Based Transcription PDF

[23] ICDAR 2019 competition on post-OCR text correction PDF

[24] Prep-OCR: A complete pipeline for document image restoration and enhanced OCR accuracy PDF

[25] Confidence-Aware Document OCR Error Detection PDF

[26] Text detection and post-OCR correction in engineering documents PDF

[27] Entropy Heat-Mapping: Localizing GPT-Based OCR Errors with Sliding-Window Shannon Analysis PDF

[28] OCR post-correction for detecting adversarial text images PDF

[29] Text error correction after text recognition based on MacBERT4CSC PDF

Training method combining pseudo-labeled cold start and GRPO

[8] Noisy-Aware Unsupervised Domain Adaptation for Scene Text Recognition PDF

[15] Seq-ups: Sequential uncertainty-aware pseudo-label selection for semi-supervised text recognition PDF

[16] Boosting semi-supervised scene text recognition via viewing and summarizing PDF

[17] Adaptive Conformal Guidance for Learning under Uncertainty PDF

[18] Watch and Act: Multi-orientation Open-Set Scene Text Recognition via Dynamic Expert Routing PDF

[19] Lightweight Reinforcement Graph Learning Framework for Low Resource Sensitive Text Classification PDF

[20] Learning Beyond Labels: Self-Supervised Handwritten Text Recognition PDF

Blur-OCR benchmark dataset

[5] Seeing is Believing? Mitigating OCR Hallucinations in Multimodal Large Language Models PDF

[30] Rethinking OCR Evaluation for Information Extraction in Business Documents PDF

[31] Rethinking OCR Evaluation PDF

[32] Reading Between the Blurs: A Comparative Analysis of Motion Deblurring Methods for Vehicle License Plate OCR PDF

Table of Contents