Training LLMs with LogicReward for Faithful and Rigorous Reasoning

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.5 Download Report PDF

reasoninglogicsymbolic

Although LLMs exhibit strong reasoning capabilities, existing training methods largely depend on outcome-based feedback, which can produce correct answers with flawed reasoning. Prior work introduces supervision on intermediate steps but still lacks guarantees of logical soundness, which is crucial in high-stakes scenarios where logical consistency is paramount. To address this, we propose LogicReward, a novel reward system that guides model training by enforcing step-level logical correctness with a theorem prover. We further introduce Autoformalization with Soft Unification, which reduces natural language ambiguity and improves formalization quality, enabling more effective use of the theorem prover. An 8B model trained on data constructed with LogicReward surpasses GPT-4o and o4-mini by 11.6% and 2% on natural language inference and logical reasoning tasks with simple training procedures. Further analysis shows that LogicReward enhances reasoning faithfulness, improves generalizability to unseen tasks such as math and commonsense reasoning, and provides a reliable reward signal even without ground-truth labels. We will release all data and code upon acceptance

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces LogicReward, a reward system that enforces step-level logical correctness using a theorem prover, and Autoformalization with Soft Unification to improve natural language formalization. It resides in the 'Reward-Based and Reinforcement Learning Approaches' leaf under 'Training-Based Methods for Reasoning Enhancement,' alongside three sibling papers. This leaf represents a focused but active research direction within a broader taxonomy of 50 papers across 36 topics, indicating moderate concentration in reward-driven training methods for logical reasoning.

The taxonomy reveals that this work sits at the intersection of training-based and neurosymbolic approaches. Neighboring leaves include 'Knowledge Distillation and Model Compression' and 'Instruction Tuning for Logical Reasoning' within the same branch, while the 'Neurosymbolic Reasoning Integration' branch explores symbolic translation and solver integration without parameter updates. The paper's use of a theorem prover for reward signals bridges these areas, combining training-time optimization with formal verification tools. The taxonomy's scope notes clarify that this leaf excludes supervised fine-tuning without reward mechanisms and inference-time-only prompting methods.

Among 29 candidates examined, the analysis identified limited prior work overlap. The core LogicReward contribution examined 9 candidates, with 1 appearing to provide overlapping prior work on step-level logical verification. Autoformalization with Soft Unification examined 10 candidates with no clear refutations, suggesting this formalization technique may be more distinctive. The performance claim examined 10 candidates without refutation, though benchmark results are inherently time-sensitive. The search scope was constrained to top-K semantic matches plus citation expansion, not an exhaustive survey of all reward-based reasoning methods.

Based on this limited search, the work appears to occupy a relatively novel position by combining theorem prover-based rewards with autoformalization techniques. The single refutable candidate for LogicReward suggests some conceptual overlap exists in step-level verification approaches, but the specific integration with soft unification and the reported performance gains may differentiate this work. The analysis does not cover all possible prior work in formal verification for language models or recent concurrent developments in this rapidly evolving area.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: training language models for logically valid and faithful reasoning. The field has organized itself into several major branches that reflect different strategies for improving reasoning capabilities. Prompting-Based Reasoning Enhancement explores how carefully designed prompts—such as Chain-of-thought Prompting[27] and Zero-shot Reasoners[1]—can elicit better reasoning without modifying model weights. Neurosymbolic Reasoning Integration combines neural networks with symbolic solvers to enforce logical consistency, while Training-Based Methods for Reasoning Enhancement focuses on fine-tuning and reinforcement learning to directly optimize reasoning behavior. Evaluation and Analysis branches examine how well models actually reason, often revealing gaps between surface accuracy and true logical validity. Adversarial and Deceptive Reasoning Scenarios probe robustness under misleading contexts, and Faithful Reasoning via Modular Architectures investigates decomposing reasoning into verifiable steps. Surveys and Comprehensive Reviews synthesize these diverse threads, as seen in works like Reasoning Survey[7] and Trustworthiness Survey[43]. Within the Training-Based Methods branch, a particularly active line of work uses reward signals and reinforcement learning to guide models toward more reliable reasoning. LogicReward[0] exemplifies this approach by designing reward functions that explicitly encourage logical validity, situating itself among other reward-driven techniques like Step-aware Verifier[2] and Collaborative Verification[41]. Nearby, SuperCorrect[46] and Search-Based Correction[48] explore iterative refinement strategies that combine training with search or self-correction mechanisms. A central tension across these methods is balancing the efficiency of end-to-end learning against the interpretability and guarantees offered by more structured or symbolic approaches. LogicReward[0] addresses this by embedding logical constraints directly into the reward structure, contrasting with works like Self-verification[3] that rely on the model's own outputs to judge correctness. This cluster highlights ongoing questions about how best to align training objectives with the nuanced requirements of faithful, step-by-step reasoning.

Claimed Contributions

LogicReward: A novel reward system enforcing step-level logical correctness

Can Refute

9 retrieved papers

The authors introduce LogicReward, a reward mechanism that evaluates reasoning chains for both premise validity (grounding in given context) and logic validity (formal logical soundness verified by a theorem prover). This provides step-level supervision with formal logical guarantees, unlike outcome-based or probabilistic process rewards.

9 retrieved papers

Can Refute

Autoformalization with Soft Unification

10 retrieved papers

The authors propose a method that prompts LLMs to supplement implicit assumptions and reduce ambiguities in natural language reasoning steps before converting them to symbolic logic. This improves the success rate of theorem-prover verification by making implicit information explicit.

10 retrieved papers

State-of-the-art performance on NLI and logical reasoning benchmarks

10 retrieved papers

By constructing training datasets using LogicReward and applying standard SFT and DPO procedures, the authors achieve new state-of-the-art results on natural language inference and logical reasoning tasks, demonstrating the effectiveness of their approach with relatively simple training methods.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[41] Improving llm reasoning through scaling inference computation with collaborative verification PDF

Liang, Zhenwen, Liu, Ye, Zhenwen Liang, Niu Tong, Ye Liu, Zhang Xiangliang, Tong Niu, Zhou, Yingbo, Xiangliang Zhang, Yavuz, Semih, Yingbo Zhou, Semih Yavuz (2024)

[46] Supercorrect: Supervising and correcting language models with error-driven insights PDF

Yang Ling, Ling Yang, YU Zhaochen, Zhaochen Yu, Zhang, Tianjun, Tianjun Zhang, Xu, Minkai, Minkai Xu, Gonzalez, Joseph E., Joseph E. Gonzalez, Cui Bin, Bin Cui, Yan, Shuicheng, Shuicheng Yan (2024)

[48] Search-Based Correction of Reasoning Chains for Language Models PDF

M Kim, JP Falet, OE Richardson, X Chen (2025)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

LogicReward: A novel reward system enforcing step-level logical correctness

[52] Training Step-Level Reasoning Verifiers with Formal Verification Tools PDF

Can Refute

[51] Rewarding progress: Scaling automated process verifiers for llm reasoning PDF

Cannot Refute

[53] ProofNet++: A Neuro-Symbolic System for Formal Proof Verification with Self-Correction PDF

Cannot Refute

[54] Deeptheorem: Advancing llm reasoning for theorem proving through natural language and reinforcement learning PDF

Cannot Refute

[55] Formal theorem proving by rewarding llms to decompose proofs hierarchically PDF

Cannot Refute

[56] Geosketch: A neural-symbolic approach to geometric multimodal reasoning with auxiliary line construction and affine transformation PDF

Cannot Refute

[57] Learning to Discover Proofs and Theorems Without Supervision PDF

Cannot Refute

[58] Generative Adversarial Reasoner: Enhancing LLM Reasoning with Adversarial Reinforcement Learning PDF

Cannot Refute

[59] Leanabell-Prover-V2: Verifier-integrated Reasoning for Formal Theorem Proving via Reinforcement Learning PDF

Cannot Refute

Contribution

Autoformalization with Soft Unification

[69] Autoformalization in the Era of Large Language Models: A Survey PDF

Cannot Refute

[70] Towards multilingual autoformalization and informalization of mathematics PDF

Cannot Refute

[71] Formalmath: Benchmarking formal mathematical reasoning of large language models PDF

Cannot Refute

[72] Autoformalization for neural theorem proving PDF

Cannot Refute

[73] Grammars of Formal Uncertainty: When to Trust LLMs in Automated Reasoning Tasks PDF

Cannot Refute

[74] Automating mathematical proof generation using large language model agents and knowledge graphs PDF

Cannot Refute

[75] Trustworthy formal natural language specifications PDF

Cannot Refute

[76] A promising path towards autoformalization and general artificial intelligence PDF

Cannot Refute

[77] Mathematical reasoning in large language models: bridging natural language and formal mathematics PDF

Cannot Refute

[78] Language Models for Verifiable Mathematical Automation Interaction, Integration, and Autoformalization PDF

Cannot Refute

Contribution

State-of-the-art performance on NLI and logical reasoning benchmarks

[5] Advancing reasoning in large language models: Promising methods and approaches PDF

Cannot Refute

[24] Enhancing systematic decompositional natural language inference using informal logic PDF

Cannot Refute

[61] Towards reasoning in large language models: A survey PDF

Cannot Refute

[62] Comparative analysis of mixture-of-agents models for natural language inference with ANLI data PDF

Cannot Refute

[63] Logic-LM: Empowering Large Language Models with Symbolic Solvers for Faithful Logical Reasoning PDF

Cannot Refute

[64] Towards large reasoning models: A survey of reinforced reasoning with large language models PDF

Cannot Refute

[65] Learning to Reason via Mixture-of-Thought for Logical Reasoning PDF

Cannot Refute

[66] FewNLU: Benchmarking state-of-the-art methods for few-shot natural language understanding PDF

Cannot Refute

[67] Entailment-preserving first-order logic representations in natural language entailment PDF

Cannot Refute

[68] Progress in neural NLP: modeling, learning, and reasoning PDF

Cannot Refute

Training LLMs with LogicReward for Faithful and Rigorous Reasoning

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[41] Improving llm reasoning through scaling inference computation with collaborative verification PDF

[46] Supercorrect: Supervising and correcting language models with error-driven insights PDF

[48] Search-Based Correction of Reasoning Chains for Language Models PDF

Contribution Analysis

LogicReward: A novel reward system enforcing step-level logical correctness

[52] Training Step-Level Reasoning Verifiers with Formal Verification Tools PDF

[51] Rewarding progress: Scaling automated process verifiers for llm reasoning PDF

[53] ProofNet++: A Neuro-Symbolic System for Formal Proof Verification with Self-Correction PDF

[54] Deeptheorem: Advancing llm reasoning for theorem proving through natural language and reinforcement learning PDF

[55] Formal theorem proving by rewarding llms to decompose proofs hierarchically PDF

[56] Geosketch: A neural-symbolic approach to geometric multimodal reasoning with auxiliary line construction and affine transformation PDF

[57] Learning to Discover Proofs and Theorems Without Supervision PDF

[58] Generative Adversarial Reasoner: Enhancing LLM Reasoning with Adversarial Reinforcement Learning PDF

[59] Leanabell-Prover-V2: Verifier-integrated Reasoning for Formal Theorem Proving via Reinforcement Learning PDF

Autoformalization with Soft Unification

[69] Autoformalization in the Era of Large Language Models: A Survey PDF

[70] Towards multilingual autoformalization and informalization of mathematics PDF

[71] Formalmath: Benchmarking formal mathematical reasoning of large language models PDF

[72] Autoformalization for neural theorem proving PDF

[73] Grammars of Formal Uncertainty: When to Trust LLMs in Automated Reasoning Tasks PDF

[74] Automating mathematical proof generation using large language model agents and knowledge graphs PDF

[75] Trustworthy formal natural language specifications PDF

[76] A promising path towards autoformalization and general artificial intelligence PDF

[77] Mathematical reasoning in large language models: bridging natural language and formal mathematics PDF

[78] Language Models for Verifiable Mathematical Automation Interaction, Integration, and Autoformalization PDF

State-of-the-art performance on NLI and logical reasoning benchmarks

[5] Advancing reasoning in large language models: Promising methods and approaches PDF

[24] Enhancing systematic decompositional natural language inference using informal logic PDF

[61] Towards reasoning in large language models: A survey PDF

[62] Comparative analysis of mixture-of-agents models for natural language inference with ANLI data PDF

[63] Logic-LM: Empowering Large Language Models with Symbolic Solvers for Faithful Logical Reasoning PDF

[64] Towards large reasoning models: A survey of reinforced reasoning with large language models PDF

[65] Learning to Reason via Mixture-of-Thought for Logical Reasoning PDF

[66] FewNLU: Benchmarking state-of-the-art methods for few-shot natural language understanding PDF

[67] Entailment-preserving first-order logic representations in natural language entailment PDF

[68] Progress in neural NLP: modeling, learning, and reasoning PDF

Table of Contents