RuleReasoner: Reinforced Rule-based Reasoning via Domain-aware Dynamic Sampling

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

logical reasoningrule-based reasoningreinforcement learninglanguage models

Rule-based reasoning is acknowledged as one of the fundamental problems of reasoning. While recent studies show that large reasoning models (LRMs) have remarkable reasoning capabilities enhanced by reinforcement learning (RL), real applications still face severe challenges due to variations in rule formats, types, and complexity. To mitigate this issue, we introduce RuleReasoner, an effective method for rule-based reasoning via a wide collection of curated tasks and a novel domain-aware dynamic sampling approach in RL. Specifically, RuleReasoner resamples each training batch by updating the domain weights based on historical rewards. This facilitates domain balance and active learning schedules for RL, obviating static mix-training engineered by human. Evaluations on in-distribution (ID) and out-of-distribution (OOD) benchmarks reveal that RuleReasoner outperforms frontier LRMs by a significant margin ( $\Delta$ 4.1% on eight ID tasks and $\Delta$ 10.4% on three OOD tasks over OpenAI-o1). Notably, our approach also exhibits higher computational efficiency compared to prior methods.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces RuleReasoner, a training framework for rule-based reasoning in large language models using reinforcement learning with domain-aware dynamic sampling. It sits within the 'Text-Domain Reasoning with Rule-Based RL' leaf, which contains only three papers total including this one. This is a relatively sparse research direction within the broader taxonomy of 50 papers, suggesting the specific combination of rule-based reasoning and RL for text-domain LLMs remains an emerging area rather than a crowded subfield.

The taxonomy reveals neighboring work in multimodal reasoning (MM-Eureka, Visual Aha Moment) and symbolic integration branches (Neural Logic RL, SymDQN), but the text-domain leaf is notably isolated. The sibling papers Logic-RL and CPGD both address rule extraction and compositional generalization respectively, while RuleReasoner focuses on dynamic sampling for domain balance during RL training. The taxonomy's scope notes clarify that this leaf excludes multimodal and symbolic integration approaches, positioning RuleReasoner squarely in pure text-based rule reasoning rather than cross-modal or formal logic synthesis.

Among 30 candidates examined, the domain-aware dynamic sampling contribution shows one refutable candidate from 10 examined, while the RuleCollection-32K dataset appears more novel with zero refutations from 10 candidates. The RLVR training regularization framework also has one refutable candidate from 10 examined. These statistics suggest moderate prior work overlap for the training methodology components, though the limited search scope means substantial related work may exist beyond the top-30 semantic matches. The dataset contribution appears least contested within this bounded search.

Based on the limited literature search covering 30 candidates, the work appears to occupy a relatively sparse research direction with some methodological overlap in training approaches but potentially novel dataset contributions. The analysis cannot claim exhaustiveness—only that among semantically similar recent papers, the core training innovations show modest prior work while the curated task collection shows less direct precedent.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: rule-based reasoning with reinforcement learning. The field integrates symbolic reasoning structures—such as temporal logic, fuzzy rules, and automata—with RL to improve interpretability, safety, and generalization. The taxonomy reveals several major branches: one focuses on large language and multimodal models that leverage rule-based guidance for text and vision tasks (e.g., MM-Eureka[3], Visual Aha Moment[4]); another emphasizes interactive and human-guided approaches where domain experts or iterative feedback shape rule discovery (e.g., Persistent Rule Interactive[2], Iterative Rule Guided[5]); temporal logic integration provides formal specifications for safe control (e.g., Temporal Logic Safe[8], Temporal Logic Reward[18]); symbolic and neuro-symbolic RL merges neural networks with logic-based representations (e.g., Neural Logic RL[10], SymDQN[19]); fuzzy logic-based methods handle uncertainty in continuous domains (e.g., Fuzzy Fractal Control[11]); domain-specific applications target robotics, autonomous driving, and energy systems (e.g., Lane Change Constraints[12], Eco-Driving Bus[14]); inference-time and meta-level reasoning explores how models decide when to apply deliberate reasoning (e.g., Think or Not[17], RL of Thoughts[29]); and automata-based learning uses finite-state machines or logic programs to structure policies (e.g., GALOIS[39], Tsetlin Machine[47]). Recent work highlights a tension between end-to-end neural flexibility and the interpretability of explicit rules. Many studies in the symbolic and neuro-symbolic branch pursue hybrid architectures that balance differentiable learning with logical constraints (e.g., Off-Policy Differentiable Logic[44], Deep ILP RL[42]), while temporal logic integration prioritizes provable safety guarantees in safety-critical domains (e.g., Temporal Logic Goals[34], Safe Highway Driving[43]). Within the text-domain reasoning cluster, RuleReasoner[0] sits alongside Logic-RL[1] and CPGD[9], all exploring how to extract or enforce logical rules during language-based reasoning tasks. Compared to Logic-RL[1], which emphasizes formal logic extraction, RuleReasoner[0] appears to focus more tightly on rule-based inference strategies tailored to textual problem-solving, while CPGD[9] investigates compositional generalization through policy decomposition. These neighboring works collectively illustrate ongoing efforts to make RL agents reason more transparently and reliably in structured symbolic environments.

Claimed Contributions

RuleReasoner training framework with domain-aware dynamic sampling

Can Refute

10 retrieved papers

The authors propose RuleReasoner, a training framework that enhances rule-based reasoning through reinforcement learning. It introduces a domain-aware dynamic sampling (DADS) method that resamples training batches by updating domain weights based on historical rewards, facilitating domain balance and active learning schedules without static human-engineered mixing.

10 retrieved papers

Can Refute

RuleCollection-32K dataset for rule-based reasoning

10 retrieved papers

The authors curate and release RuleCollection-32K, a dataset containing 32K examples across eight rule-based reasoning tasks. The dataset features varying rule formats, reasoning forms, and complexity levels, designed to enable training and evaluation of generalizable rule application rather than memorization.

10 retrieved papers

RLVR framework with training regularization for rule-based reasoning

Can Refute

10 retrieved papers

The authors design a Reinforcement Learning with Verifiable Rewards (RLVR) framework incorporating training regularization techniques such as disabling entropy bonus, discarding KL divergence, and rule order shuffling. This framework achieves stable training dynamics for complex rule-based reasoning tasks and improves generalization to unseen rules.

10 retrieved papers

Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[1] Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning PDF

Xie Tian, Tian Xie, Zitian Gao, Luo Hao-ming, Qingnan Ren, Hong, Yuqian, Haoming Luo, Yuqian Hong, Zhou Joey, Bryan Dai, Qiu Kai, Joey Zhou, Wu, Zhirong, Kai Qiu, Luo Chong, Zhirong Wu, Chong Luo (2025)

[9] CPGD: Toward Stable Rule-based Reinforcement Learning for Language Models PDF

Liu Zongkai, Meng Fanqing, Du Ling-Xiao, Zhou Zhixiang, Yu Chao, Shao, Wenqi, Zhang Qiaosheng (2025)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

RuleReasoner training framework with domain-aware dynamic sampling

[57] DUMP: Automated Distribution-Level Curriculum Learning for RL-based LLM Post-training PDF

Can Refute

[51] RL for Reasoning by Adaptively Revealing Rationales PDF

Cannot Refute

[52] Dynamic Sampling that Adapts: Iterative DPO for Self-Aware Mathematical Reasoning PDF

Cannot Refute

[53] Multi-Agent Collaborative Decision-Making Using Small Vision-Language Models for Autonomous Driving PDF

Cannot Refute

[54] Easyso: Exploration-enhanced reinforcement learning for logic synthesis sequence optimization and a comprehensive rl environment PDF

Cannot Refute

[55] Logical specifications-guided dynamic task sampling for reinforcement learning agents PDF

Cannot Refute

[56] Natural Logic at the Core: Dynamic Rewards for Entailment Tree Generation PDF

Cannot Refute

[58] Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective PDF

Cannot Refute

[59] Rethinking Reasoning Quality in Large Language Models through Enhanced Chain-of-Thought via RL PDF

Cannot Refute

[60] Enhancing neural theorem proving through data augmentation and dynamic sampling method PDF

Cannot Refute

Contribution

RuleCollection-32K dataset for rule-based reasoning

[3] MM-Eureka: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning PDF

Cannot Refute

[61] Rulearena: A benchmark for rule-guided reasoning with llms in real-world scenarios PDF

Cannot Refute

[62] Logicgame: Benchmarking rule-based reasoning abilities of large language models PDF

Cannot Refute

[63] Chain of Logic: Rule-Based Reasoning with Large Language Models PDF

Cannot Refute

[64] Follow the Rules: Reasoning for Video Anomaly Detection with Large Language Models PDF

Cannot Refute

[65] Integrated RuleâBased Geomodeling â Explicit and Implicit Approaches PDF

Cannot Refute

[66] From Explicit Rules to Implicit Reasoning in an Interpretable Violence Monitoring System PDF

Cannot Refute

[67] Knowledge-based Research on Automation Engines PDF

Cannot Refute

[68] A rule-based approach to aspect extraction from product reviews PDF

Cannot Refute

[69] Scaling Synthetic Logical Reasoning Datasets with Context-Sensitive Declarative Grammars PDF

Cannot Refute

Contribution

RLVR framework with training regularization for rule-based reasoning

[1] Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning PDF

Can Refute

[21] DocThinker: Explainable Multimodal Large Language Models with Rule-based Reinforcement Learning for Document Understanding PDF

Cannot Refute

[44] Off-Policy Differentiable Logic Reinforcement Learning PDF

Cannot Refute

[70] Learning to reason under off-policy guidance PDF

Cannot Refute

[71] Revisiting LLM Reasoning via Information Bottleneck PDF

Cannot Refute

[72] Hybrid Reward Normalization for Process-supervised Non-verifiable Agentic Tasks PDF

Cannot Refute

[73] Adaptive Guidance Accelerates Reinforcement Learning of Reasoning Models PDF

Cannot Refute

[74] Opportunities and challenges in deep learning adversarial robustness: A survey PDF

Cannot Refute

[75] Crop: Certifying robust policies for reinforcement learning through functional smoothing PDF

Cannot Refute

[76] Agentic AI Reinforcement Learning and Security PDF

Cannot Refute

RuleReasoner: Reinforced Rule-based Reasoning via Domain-aware Dynamic Sampling

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[1] Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning PDF

[9] CPGD: Toward Stable Rule-based Reinforcement Learning for Language Models PDF

Contribution Analysis

RuleReasoner training framework with domain-aware dynamic sampling

[57] DUMP: Automated Distribution-Level Curriculum Learning for RL-based LLM Post-training PDF

[51] RL for Reasoning by Adaptively Revealing Rationales PDF

[52] Dynamic Sampling that Adapts: Iterative DPO for Self-Aware Mathematical Reasoning PDF

[53] Multi-Agent Collaborative Decision-Making Using Small Vision-Language Models for Autonomous Driving PDF

[54] Easyso: Exploration-enhanced reinforcement learning for logic synthesis sequence optimization and a comprehensive rl environment PDF

[55] Logical specifications-guided dynamic task sampling for reinforcement learning agents PDF

[56] Natural Logic at the Core: Dynamic Rewards for Entailment Tree Generation PDF

[58] Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective PDF

[59] Rethinking Reasoning Quality in Large Language Models through Enhanced Chain-of-Thought via RL PDF

[60] Enhancing neural theorem proving through data augmentation and dynamic sampling method PDF

RuleCollection-32K dataset for rule-based reasoning

[3] MM-Eureka: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning PDF

[61] Rulearena: A benchmark for rule-guided reasoning with llms in real-world scenarios PDF

[62] Logicgame: Benchmarking rule-based reasoning abilities of large language models PDF

[63] Chain of Logic: Rule-Based Reasoning with Large Language Models PDF

[64] Follow the Rules: Reasoning for Video Anomaly Detection with Large Language Models PDF

[65] Integrated RuleâBased Geomodeling â Explicit and Implicit Approaches PDF

[66] From Explicit Rules to Implicit Reasoning in an Interpretable Violence Monitoring System PDF

[67] Knowledge-based Research on Automation Engines PDF

[68] A rule-based approach to aspect extraction from product reviews PDF

[69] Scaling Synthetic Logical Reasoning Datasets with Context-Sensitive Declarative Grammars PDF

RLVR framework with training regularization for rule-based reasoning

[1] Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning PDF

[21] DocThinker: Explainable Multimodal Large Language Models with Rule-based Reinforcement Learning for Document Understanding PDF

[44] Off-Policy Differentiable Logic Reinforcement Learning PDF

[70] Learning to reason under off-policy guidance PDF

[71] Revisiting LLM Reasoning via Information Bottleneck PDF

[72] Hybrid Reward Normalization for Process-supervised Non-verifiable Agentic Tasks PDF

[73] Adaptive Guidance Accelerates Reinforcement Learning of Reasoning Models PDF

[74] Opportunities and challenges in deep learning adversarial robustness: A survey PDF

[75] Crop: Certifying robust policies for reinforcement learning through functional smoothing PDF

[76] Agentic AI Reinforcement Learning and Security PDF

Table of Contents

[65] Integrated RuleâBased Geomodeling â Explicit and Implicit Approaches PDF