RuleReasoner: Reinforced Rule-based Reasoning via Domain-aware Dynamic Sampling
Overview
Overall Novelty Assessment
The paper introduces RuleReasoner, a training framework for rule-based reasoning in large language models using reinforcement learning with domain-aware dynamic sampling. It sits within the 'Text-Domain Reasoning with Rule-Based RL' leaf, which contains only three papers total including this one. This is a relatively sparse research direction within the broader taxonomy of 50 papers, suggesting the specific combination of rule-based reasoning and RL for text-domain LLMs remains an emerging area rather than a crowded subfield.
The taxonomy reveals neighboring work in multimodal reasoning (MM-Eureka, Visual Aha Moment) and symbolic integration branches (Neural Logic RL, SymDQN), but the text-domain leaf is notably isolated. The sibling papers Logic-RL and CPGD both address rule extraction and compositional generalization respectively, while RuleReasoner focuses on dynamic sampling for domain balance during RL training. The taxonomy's scope notes clarify that this leaf excludes multimodal and symbolic integration approaches, positioning RuleReasoner squarely in pure text-based rule reasoning rather than cross-modal or formal logic synthesis.
Among 30 candidates examined, the domain-aware dynamic sampling contribution shows one refutable candidate from 10 examined, while the RuleCollection-32K dataset appears more novel with zero refutations from 10 candidates. The RLVR training regularization framework also has one refutable candidate from 10 examined. These statistics suggest moderate prior work overlap for the training methodology components, though the limited search scope means substantial related work may exist beyond the top-30 semantic matches. The dataset contribution appears least contested within this bounded search.
Based on the limited literature search covering 30 candidates, the work appears to occupy a relatively sparse research direction with some methodological overlap in training approaches but potentially novel dataset contributions. The analysis cannot claim exhaustiveness—only that among semantically similar recent papers, the core training innovations show modest prior work while the curated task collection shows less direct precedent.
Taxonomy
Research Landscape Overview
Claimed Contributions
The authors propose RuleReasoner, a training framework that enhances rule-based reasoning through reinforcement learning. It introduces a domain-aware dynamic sampling (DADS) method that resamples training batches by updating domain weights based on historical rewards, facilitating domain balance and active learning schedules without static human-engineered mixing.
The authors curate and release RuleCollection-32K, a dataset containing 32K examples across eight rule-based reasoning tasks. The dataset features varying rule formats, reasoning forms, and complexity levels, designed to enable training and evaluation of generalizable rule application rather than memorization.
The authors design a Reinforcement Learning with Verifiable Rewards (RLVR) framework incorporating training regularization techniques such as disabling entropy bonus, discarding KL divergence, and rule order shuffling. This framework achieves stable training dynamics for complex rule-based reasoning tasks and improves generalization to unseen rules.
Core Task Comparisons
Comparisons with papers in the same taxonomy category
[1] Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning PDF
[9] CPGD: Toward Stable Rule-based Reinforcement Learning for Language Models PDF
Contribution Analysis
Detailed comparisons for each claimed contribution
RuleReasoner training framework with domain-aware dynamic sampling
The authors propose RuleReasoner, a training framework that enhances rule-based reasoning through reinforcement learning. It introduces a domain-aware dynamic sampling (DADS) method that resamples training batches by updating domain weights based on historical rewards, facilitating domain balance and active learning schedules without static human-engineered mixing.
[57] DUMP: Automated Distribution-Level Curriculum Learning for RL-based LLM Post-training PDF
[51] RL for Reasoning by Adaptively Revealing Rationales PDF
[52] Dynamic Sampling that Adapts: Iterative DPO for Self-Aware Mathematical Reasoning PDF
[53] Multi-Agent Collaborative Decision-Making Using Small Vision-Language Models for Autonomous Driving PDF
[54] Easyso: Exploration-enhanced reinforcement learning for logic synthesis sequence optimization and a comprehensive rl environment PDF
[55] Logical specifications-guided dynamic task sampling for reinforcement learning agents PDF
[56] Natural Logic at the Core: Dynamic Rewards for Entailment Tree Generation PDF
[58] Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective PDF
[59] Rethinking Reasoning Quality in Large Language Models through Enhanced Chain-of-Thought via RL PDF
[60] Enhancing neural theorem proving through data augmentation and dynamic sampling method PDF
RuleCollection-32K dataset for rule-based reasoning
The authors curate and release RuleCollection-32K, a dataset containing 32K examples across eight rule-based reasoning tasks. The dataset features varying rule formats, reasoning forms, and complexity levels, designed to enable training and evaluation of generalizable rule application rather than memorization.
[3] MM-Eureka: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning PDF
[61] Rulearena: A benchmark for rule-guided reasoning with llms in real-world scenarios PDF
[62] Logicgame: Benchmarking rule-based reasoning abilities of large language models PDF
[63] Chain of Logic: Rule-Based Reasoning with Large Language Models PDF
[64] Follow the Rules: Reasoning for Video Anomaly Detection with Large Language Models PDF
[65] Integrated RuleâBased Geomodeling â Explicit and Implicit Approaches PDF
[66] From Explicit Rules to Implicit Reasoning in an Interpretable Violence Monitoring System PDF
[67] Knowledge-based Research on Automation Engines PDF
[68] A rule-based approach to aspect extraction from product reviews PDF
[69] Scaling Synthetic Logical Reasoning Datasets with Context-Sensitive Declarative Grammars PDF
RLVR framework with training regularization for rule-based reasoning
The authors design a Reinforcement Learning with Verifiable Rewards (RLVR) framework incorporating training regularization techniques such as disabling entropy bonus, discarding KL divergence, and rule order shuffling. This framework achieves stable training dynamics for complex rule-based reasoning tasks and improves generalization to unseen rules.