TaTToo: Tool-Grounded Thinking PRM for Test-Time Scaling in Tabular Reasoning

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

Process Reward ModelTabular ReasoningTool IntegrationTest-time Scaling

Process Reward Models (PRMs) have recently emerged as a powerful framework for enhancing the reasoning capabilities of large reasoning models (LRMs), particularly in the context of test-time scaling (TTS). However, their potential for supervising LRMs on tabular reasoning domains remains underexplored. Through detailed empirical analyses, we identify that existing PRMs, though widely adopted for supervising text-only reasoning steps, struggle with table-specific operations such as sub-table retrieval and schema interaction, leading to critical performance bottlenecks. To address this limitation, we propose TaTToo, a novel table-grounded PRM framework that (i) reasons explicitly over tabular reasoning steps and (ii) integrates tool-based verification to provide precise reward supervision. Concretely, we first design a scalable data curation pipeline that constructs over 60k high-quality step-level annotations by integrating table verification rationales with tool-based executions. Building on the collected data, we train TaTToo with a dual-stage paradigm: cold-start supervised fine-tuning to capture tool-use reasoning patterns, followed by reinforcement learning with tool-grounded reward shaping to align our model with table-based verification. We provide a comprehensive evaluation of the policy improvement induced by our newly designed PRM. Across 5 challenging tabular reasoning benchmarks covering numerical reasoning, fact-checking, and data analysis, TaTToo improves downstream policy LRMs by 30.9% at inference, surpasses strong PRM baselines such as Qwen-2.5-Math-PRM-72B with only 8B parameters, and demonstrates strong generalizability across diverse TTS strategies.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces TaTToo, a process reward model framework specifically designed for tabular reasoning with test-time scaling. It resides in the 'Tool-Augmented PRMs for Tabular Reasoning' leaf, which contains only two papers total (including this work). This represents a sparse, emerging research direction within the broader taxonomy of nine papers across process reward modeling. The work addresses a recognized gap: existing PRMs struggle with table-specific operations like sub-table retrieval and schema interaction, motivating a domain-specialized approach.

The taxonomy reveals that TaTToo sits at the intersection of three broader research threads: domain-specific process reward modeling, test-time scaling strategies, and PRM training paradigms. Neighboring leaves include 'Generative and Reasoning-Driven PRMs' (three papers) and 'Inference-Time Scaling for Tabular Reasoning Tasks' (two papers). The framework bridges these areas by combining tool-based verification (domain-specific) with reinforcement learning for test-time search (inference-time scaling). This positioning suggests the work synthesizes ideas from multiple established directions rather than pioneering an entirely new branch.

Among the three contributions analyzed, the literature search examined twenty-eight candidates total. The 'Tool-Grounded Thinking PRM Framework' examined ten candidates with zero refutable matches; the 'Scalable Data Curation Pipeline' also examined ten with zero refutations; the 'Dual-Stage Training Paradigm' examined eight with zero refutations. These statistics indicate that within the limited search scope, no prior work was identified that directly overlaps with the specific combination of tool-grounded process rewards and dual-stage training for tabular reasoning. However, the small candidate pool and sparse taxonomy leaf suggest this assessment reflects limited coverage rather than exhaustive validation.

Given the sparse taxonomy leaf and limited search scope, the work appears to occupy a relatively unexplored niche combining process reward modeling with tabular tool use. The absence of refutable candidates across all contributions may reflect genuine novelty in this specific integration, or may indicate that the semantic search did not surface closely related work outside the top-thirty matches. The analysis covers domain-specific PRM design but does not exhaustively address broader tabular reasoning or general test-time scaling literature.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: process reward modeling for tabular reasoning with test-time scaling. The field structure reflects a convergence of process-level supervision, domain-specific reasoning challenges, and inference-time computation strategies. The taxonomy organizes work into several main branches: architectures and training paradigms for process reward models (PRMs) that provide step-by-step feedback, domain-specific applications where PRMs are tailored to particular reasoning contexts such as mathematical problem-solving or structured data interpretation, test-time scaling strategies that allocate additional computation during inference to improve solution quality, and comprehensive surveys synthesizing these developments. Representative works like Process Reward Thinking[1] and Rewarding Progress[3] illustrate foundational PRM training methods, while efforts such as GenPRM[5] explore generative formulations that expand the scope of process supervision beyond traditional classification-based reward assignment. A particularly active line of work focuses on tool-augmented PRMs for tabular reasoning, where models must interact with structured data through executable operations. This setting introduces unique challenges in credit assignment and verification, as intermediate steps involve both symbolic manipulation and semantic understanding of table contents. TaTToo[0] situates itself within this specialized branch, emphasizing the integration of process rewards with test-time search over tool-assisted reasoning traces. Compared to Table-r1[4] and Table-R1[6], which also target tabular domains, TaTToo[0] places stronger emphasis on the interplay between step-level reward signals and adaptive inference-time computation budgets. Meanwhile, approaches like Adaptive Test-Time[8] explore dynamic allocation strategies across domains, and R-PRM[7] investigates reward model robustness. The central tension across these works involves balancing the granularity of process supervision, the computational overhead of test-time scaling, and the reliability of learned reward signals in guiding multi-step reasoning over structured data.

Claimed Contributions

TaTToo: Tool-Grounded Thinking PRM Framework

10 retrieved papers

The authors introduce TaTToo, a process reward model specifically designed for tabular reasoning that provides step-level supervision by explicitly reasoning over table operations and incorporating external tools for verification. This framework addresses limitations of existing PRMs in supervising table retrieval and schema interaction steps.

10 retrieved papers

Scalable Data Curation Pipeline with Tool-Augmented Annotations

10 retrieved papers

The authors develop a three-stage data curation pipeline that synthesizes over 60,000 high-quality training instances by collecting expert verification rationales, assigning table-aware rewards, and augmenting them with tool invocations and execution results for training the PRM.

10 retrieved papers

Dual-Stage Training Paradigm with Tool-Grounded Reward Shaping

8 retrieved papers

The authors propose a two-stage training approach that first uses supervised fine-tuning to learn tool-integrated verification patterns, then applies reinforcement learning with a novel reward shaping scheme that includes label-matching, confidence calibration, and tool-grounding components to optimize the PRM for accurate table verification.

8 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[9] TATTO: Tool-Augmented Thinking PRM for Tabular Reasoning PDF

J Zou, S Roy, VK Verma, Z Wang, D Wipf, P Lu (0)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

TaTToo: Tool-Grounded Thinking PRM Framework

[3] Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning PDF

Cannot Refute

[6] Table-R1: Inference-Time Scaling for Table Reasoning PDF

Cannot Refute

[17] VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use PDF

Cannot Refute

[18] Utilizing Large Language Models for Robot Skill Reward Shaping in Reinforcement Learning PDF

Cannot Refute

[19] Agent0-VL: Exploring Self-Evolving Agent for Tool-Integrated Vision-Language Reasoning PDF

Cannot Refute

[20] TableMind: An Autonomous Programmatic Agent for Tool-Augmented Table Reasoning PDF

Cannot Refute

[21] Exploring Generative Process Reward Modeling for Semi-Structured Data: A Case Study of Table Question Answering PDF

Cannot Refute

[22] SCRIBE: Structured Mid-Level Supervision for Tool-Using Language Models PDF

Cannot Refute

[23] Trade-R1: Bridging Verifiable Rewards to Stochastic Environments via Process-Level Reasoning Verification PDF

Cannot Refute

[24] TableGPT-R1: Advancing Tabular Reasoning Through Reinforcement Learning PDF

Cannot Refute

Contribution

Scalable Data Curation Pipeline with Tool-Augmented Annotations

[25] Toolvqa: A dataset for multi-step reasoning vqa with external tools PDF

Cannot Refute

[26] Teaching code llms to use autocompletion tools in repository-level code generation PDF

Cannot Refute

[27] Os-genesis: Automating gui agent trajectory construction via reverse task synthesis PDF

Cannot Refute

[28] Visual program distillation: Distilling tools and programmatic reasoning into vision-language models PDF

Cannot Refute

[29] InfiJanice: Joint Analysis and In-situ Correction Engine for Quantization-Induced Math Degradation in Large Language Models PDF

Cannot Refute

[30] DatasetAgent: A Novel Multi-Agent System for Auto-Constructing Datasets from Real-World Images PDF

Cannot Refute

[31] DroidCall: A Dataset for LLM-powered Android Intent Invocation PDF

Cannot Refute

[32] Spider2-v: How far are multimodal agents from automating data science and engineering workflows? PDF

Cannot Refute

[33] Geospatial large language model trained with a simulated environment for generating tool-use chains autonomously PDF

Cannot Refute

[34] Invocable APIs derived from NL2SQL datasets for LLM Tool-Calling Evaluation PDF

Cannot Refute

Contribution

Dual-Stage Training Paradigm with Tool-Grounded Reward Shaping

[9] TATTO: Tool-Augmented Thinking PRM for Tabular Reasoning PDF

Cannot Refute

[10] ReFT: Reasoning with Reinforced Fine-Tuning PDF

Cannot Refute

[11] CRScore++: Reinforcement Learning with Verifiable Tool and AI Feedback for Code Review PDF

Cannot Refute

[12] Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning PDF

Cannot Refute

[13] AdaTooler-V: Adaptive Tool-Use for Images and Videos PDF

Cannot Refute

[14] Code-in-the-Loop Forensics: Agentic Tool Use for Image Forgery Detection PDF

Cannot Refute

[15] LLMs Are Bad At Math: Improving Math Reasoning with RL and External Tooling PDF

Cannot Refute

[16] CS598 JY2 Final Survey Report-Multimodal Web Agents PDF

Cannot Refute

TaTToo: Tool-Grounded Thinking PRM for Test-Time Scaling in Tabular Reasoning

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[9] TATTO: Tool-Augmented Thinking PRM for Tabular Reasoning PDF

Contribution Analysis

TaTToo: Tool-Grounded Thinking PRM Framework

[3] Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning PDF

[6] Table-R1: Inference-Time Scaling for Table Reasoning PDF

[17] VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use PDF

[18] Utilizing Large Language Models for Robot Skill Reward Shaping in Reinforcement Learning PDF

[19] Agent0-VL: Exploring Self-Evolving Agent for Tool-Integrated Vision-Language Reasoning PDF

[20] TableMind: An Autonomous Programmatic Agent for Tool-Augmented Table Reasoning PDF

[21] Exploring Generative Process Reward Modeling for Semi-Structured Data: A Case Study of Table Question Answering PDF

[22] SCRIBE: Structured Mid-Level Supervision for Tool-Using Language Models PDF

[23] Trade-R1: Bridging Verifiable Rewards to Stochastic Environments via Process-Level Reasoning Verification PDF

[24] TableGPT-R1: Advancing Tabular Reasoning Through Reinforcement Learning PDF

Scalable Data Curation Pipeline with Tool-Augmented Annotations

[25] Toolvqa: A dataset for multi-step reasoning vqa with external tools PDF

[26] Teaching code llms to use autocompletion tools in repository-level code generation PDF

[27] Os-genesis: Automating gui agent trajectory construction via reverse task synthesis PDF

[28] Visual program distillation: Distilling tools and programmatic reasoning into vision-language models PDF

[29] InfiJanice: Joint Analysis and In-situ Correction Engine for Quantization-Induced Math Degradation in Large Language Models PDF

[30] DatasetAgent: A Novel Multi-Agent System for Auto-Constructing Datasets from Real-World Images PDF

[31] DroidCall: A Dataset for LLM-powered Android Intent Invocation PDF

[32] Spider2-v: How far are multimodal agents from automating data science and engineering workflows? PDF

[33] Geospatial large language model trained with a simulated environment for generating tool-use chains autonomously PDF

[34] Invocable APIs derived from NL2SQL datasets for LLM Tool-Calling Evaluation PDF

Dual-Stage Training Paradigm with Tool-Grounded Reward Shaping

[9] TATTO: Tool-Augmented Thinking PRM for Tabular Reasoning PDF

[10] ReFT: Reasoning with Reinforced Fine-Tuning PDF

[11] CRScore++: Reinforcement Learning with Verifiable Tool and AI Feedback for Code Review PDF

[12] Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning PDF

[13] AdaTooler-V: Adaptive Tool-Use for Images and Videos PDF

[14] Code-in-the-Loop Forensics: Agentic Tool Use for Image Forgery Detection PDF

[15] LLMs Are Bad At Math: Improving Math Reasoning with RL and External Tooling PDF

[16] CS598 JY2 Final Survey Report-Multimodal Web Agents PDF

Table of Contents