QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

Large Language Model; Reinforcement Learning

Reinforcement learning (RL) has emerged as a central paradigm for training large language models (LLMs) in reasoning tasks. Yet recent studies question RL’s ability to incentivize reasoning capacity beyond the base model. This raises a key challenge: how can RL be adapted to solve harder reasoning problems more effectively? To address this challenge, we propose a simple yet effective strategy via Question Augmentation: introduce partial solutions during training to reduce problem difficulty and provide more informative learning signals. Our method, QuestA, when applied during RL training on math reasoning tasks, not only improves pass@1 but also pass@k—particularly on problems where standard RL struggles to make progress. This enables continual improvement over strong open-source models such as DeepScaleR and OpenMath Nemotron, further enhancing their reasoning capabilities. We achieve new state-of-the-art results on math benchmarks using 1.5B-parameter models: 72.50% (+10.73%) on AIME24, 62.29% (+12.79%) on AIME25, and 41.67% (+10.11%) on HMMT25. Code, data and model are available at https://anonymous.4open.science/r/questa932.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes QuestA, a question augmentation strategy that introduces partial solutions during RL training to reduce problem difficulty and improve learning signals for mathematical reasoning. It sits within the Mathematical Reasoning leaf of the taxonomy, which contains only three papers total, indicating a relatively focused but not overcrowded research direction. The sibling papers—WizardMath and Deeptheorem—similarly target RL-based mathematical problem solving, suggesting this work occupies a well-defined niche within domain-specific applications of RL for reasoning enhancement.

The taxonomy reveals that Mathematical Reasoning is one subcategory under Domain-Specific Applications, alongside Software Engineering, Multimodal Reasoning, and Specialized Applications. Neighboring branches include Core RL Algorithms (policy optimization, reward design) and Reasoning Paradigms (chain-of-thought, search integration). QuestA's focus on question augmentation connects it to Training Strategies and Optimization under Core RL Algorithms, as partial solutions can be viewed as a curriculum learning or data augmentation technique. The taxonomy's scope notes clarify that domain-specific methods like QuestA are distinguished from general-purpose algorithm design, positioning this work as an application-driven adaptation rather than a foundational RL innovation.

Among 29 candidates examined, the contribution-level analysis reveals mixed novelty signals. The QuestA method itself examined 9 candidates with 1 refutable match, suggesting some prior work on question augmentation or partial-solution strategies exists within the limited search scope. The theoretical analysis examined 10 candidates with 2 refutable matches, indicating that benefits of partial-solution augmentation may have been explored previously. The state-of-the-art results contribution examined 10 candidates with 1 refutable match, implying that performance claims on these benchmarks face some prior competition. These statistics reflect a top-K semantic search, not an exhaustive literature review, so the presence of refutable candidates indicates overlap within the examined subset rather than definitive lack of novelty.

Given the limited search scope of 29 candidates, the analysis suggests QuestA introduces a focused adaptation of RL training for mathematical reasoning, with some overlap in each contribution area among the examined papers. The work appears to refine existing ideas—question augmentation, partial solutions, and benchmark performance—rather than introduce entirely unprecedented concepts. However, the sparse Mathematical Reasoning leaf and the specific combination of techniques may still offer incremental value. A broader literature search would be needed to assess whether the integration of these elements constitutes a meaningful advance over the field's current state.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Enhancing reasoning capacity in large language models through reinforcement learning. The field has organized itself into several major branches that reflect both methodological diversity and application focus. At the highest level, researchers distinguish between core RL algorithms and training methods—where foundational techniques such as policy gradient variants and reward modeling are developed—and empirical analyses that systematically evaluate how RL shapes reasoning behavior. Parallel branches address reasoning paradigms and architectures (exploring chain-of-thought, search-based inference, and latent reasoning structures), domain-specific applications (including mathematical reasoning, code generation, and interactive agents), and resource-constrained methods that prioritize efficiency. Additional branches cover alternative paradigms (such as diffusion-based or non-standard architectures), external knowledge integration, and survey papers that synthesize emerging trends. Works like Deepseek-r1[3] and Large Reasoning Models[8] illustrate how core RL techniques scale to complex reasoning tasks, while WizardMath[18] and Deeptheorem[48] exemplify domain-specific mathematical applications. Within this landscape, mathematical reasoning has emerged as a particularly active testbed, drawing on both supervised fine-tuning pipelines and RL-driven exploration to handle multi-step problem solving. Many studies in this branch investigate trade-offs between sample efficiency, verifiability of intermediate steps, and the balance between exploration and exploitation during training. QuestA[0] situates itself squarely in this mathematical reasoning cluster, emphasizing RL-based enhancement of reasoning chains for question-answering tasks. Its approach aligns closely with neighbors like WizardMath[18], which also targets mathematical problem solving, and Deeptheorem[48], which extends RL methods to formal theorem proving. Compared to these works, QuestA[0] appears to focus on refining the reward signal and training dynamics specific to question-driven reasoning, contributing to ongoing efforts to make RL more effective and interpretable in structured mathematical domains.

Claimed Contributions

QuestA method via question augmentation with partial solutions

Can Refute

9 retrieved papers

The authors introduce QuestA, a data augmentation approach that prepends partial solutions to hard reasoning problems during reinforcement learning training. This method scaffolds difficult problems by revealing intermediate steps, making them more tractable while providing denser reward signals for more efficient RL training.

9 retrieved papers

Can Refute

Theoretical analysis of partial-solution augmentation benefits

Can Refute

10 retrieved papers

The paper provides formal theoretical justification showing that augmenting questions with partial solutions (hints) improves RL sample efficiency. The analysis demonstrates that hints enable the model to discover valid trajectories with asymptotically lower sampling budget compared to training without hints.

10 retrieved papers

Can Refute

State-of-the-art results for 1.5B-parameter models on math benchmarks

Can Refute

10 retrieved papers

The authors demonstrate that applying QuestA to small-scale models (1.5B parameters) achieves new state-of-the-art performance on challenging mathematical reasoning benchmarks, substantially outperforming existing models of similar size and even matching or exceeding much larger 32B-parameter models.

10 retrieved papers

Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[18] WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct PDF

Luo, Haipeng, Sun, Qingfeng, Haipeng Luo, Xu Can, Qingfeng Sun, Zhao Pu, Can Xu, Lou, Jianguang, Pu Zhao, Tao Chongyang, Jian-Guang Lou, Geng, Xiubo, Chongyang Tao, Lin, Qingwei, Xiubo Geng, Chen, Shifeng, Qingwei Lin, Tang, Yansong, Shifeng Chen, Zhang Dongmei, Dongmei Zhang (2023)

[48] Deeptheorem: Advancing llm reasoning for theorem proving through natural language and reinforcement learning PDF

Zhang Zi-yin, Xu Jiahao, Ziyin Zhang, He Zhiwei, Jiahao Xu, Liang Tian, Zhiwei He, Liu Qiuzhi, Tian Liang, Qiuzhi Liu, Song, Linfeng, Yansi Li, Liang, Zhenwen, Linfeng Song, Zhang, Zhuosheng, Zhen-Pu Liang, Wang Rui, Zhuosheng Zhang, Tu, Zhaopeng, Rui Wang, Mi, Haitao, Zhaopeng Tu, Yu Dong, Haitao Mi, Dong Yu (2025)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

QuestA method via question augmentation with partial solutions

[58] From Data-Centric to Sample-Centric: Enhancing LLM Reasoning via Progressive Optimization PDF

Can Refute

[51] Using incomplete and incorrect plans to shape reinforcement learning in long-sequence sparse-reward tasks PDF

Cannot Refute

[52] Reinforcement learning with dynamic completion for answering multi-hop questions over incomplete knowledge graph PDF

Cannot Refute

[53] Promed: Shapley information gain guided reinforcement learning for proactive medical llms PDF

Cannot Refute

[54] Enhancing policy gradient for traveling salesman problem with data augmented behavior cloning PDF

Cannot Refute

[55] Mixture of Autoencoder Experts Guidance using Unlabeled and Incomplete Data for Exploration in Reinforcement Learning PDF

Cannot Refute

[56] Generative question refinement with deep reinforcement learning in retrieval-based QA system PDF

Cannot Refute

[57] OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning PDF

Cannot Refute

[59] Converting Natural Language to Query Languages Using Large Language Models: A Systematic Literature Review PDF

Cannot Refute

Contribution

Theoretical analysis of partial-solution augmentation benefits

[60] StepHint: Multi-level Stepwise Hints Enhance Reinforcement Learning to Reason PDF

Can Refute

[61] Adaptive Guidance Accelerates Reinforcement Learning of Reasoning Models PDF

Can Refute

[58] From Data-Centric to Sample-Centric: Enhancing LLM Reasoning via Progressive Optimization PDF

Cannot Refute

[62] Scalable fragment-based 3d molecular design with reinforcement learning PDF

Cannot Refute

[63] Diffusion-DICE: In-Sample Diffusion Guidance for Offline Reinforcement Learning PDF

Cannot Refute

[64] DRlinker: deep reinforcement learning for optimization in fragment linking design PDF

Cannot Refute

[65] Sample Efficient Reinforcement Learning with Partial Dynamics Knowledge PDF

Cannot Refute

[66] Integrating reaction schemes, reagent databases, and virtual libraries into fragment-based design by reinforcement learning PDF

Cannot Refute

[67] Guiding Reinforcement Learning with Incomplete System Dynamics PDF

Cannot Refute

[68] ADHint: Adaptive Hints with Difficulty Priors for Reinforcement Learning PDF

Cannot Refute

Contribution

State-of-the-art results for 1.5B-parameter models on math benchmarks

[69] rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking PDF

Can Refute

[37] Swe-rl: Advancing llm reasoning via reinforcement learning on open software evolution PDF

Cannot Refute

[70] Specializing smaller language models towards multi-step reasoning PDF

Cannot Refute

[71] Chain of draft: Thinking faster by writing less PDF

Cannot Refute

[72] Scalediff: Scaling difficult problems for advanced mathematical reasoning PDF

Cannot Refute

[73] Small models struggle to learn from strong reasoners PDF

Cannot Refute

[74] LLM performance on mathematical reasoning in Catalan language PDF

Cannot Refute

[75] Orca 2: Teaching Small Language Models How to Reason PDF

Cannot Refute

[76] Jiuzhang3. 0: Efficiently improving mathematical reasoning by training small data synthesis models PDF

Cannot Refute

[77] Tiny-R1V: Lightweight Multimodal Unified Reasoning Model via Model Merging PDF

Cannot Refute

QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[18] WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct PDF

[48] Deeptheorem: Advancing llm reasoning for theorem proving through natural language and reinforcement learning PDF

Contribution Analysis

QuestA method via question augmentation with partial solutions

[58] From Data-Centric to Sample-Centric: Enhancing LLM Reasoning via Progressive Optimization PDF

[51] Using incomplete and incorrect plans to shape reinforcement learning in long-sequence sparse-reward tasks PDF

[52] Reinforcement learning with dynamic completion for answering multi-hop questions over incomplete knowledge graph PDF

[53] Promed: Shapley information gain guided reinforcement learning for proactive medical llms PDF

[54] Enhancing policy gradient for traveling salesman problem with data augmented behavior cloning PDF

[55] Mixture of Autoencoder Experts Guidance using Unlabeled and Incomplete Data for Exploration in Reinforcement Learning PDF

[56] Generative question refinement with deep reinforcement learning in retrieval-based QA system PDF

[57] OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning PDF

[59] Converting Natural Language to Query Languages Using Large Language Models: A Systematic Literature Review PDF

Theoretical analysis of partial-solution augmentation benefits

[60] StepHint: Multi-level Stepwise Hints Enhance Reinforcement Learning to Reason PDF

[61] Adaptive Guidance Accelerates Reinforcement Learning of Reasoning Models PDF

[58] From Data-Centric to Sample-Centric: Enhancing LLM Reasoning via Progressive Optimization PDF

[62] Scalable fragment-based 3d molecular design with reinforcement learning PDF

[63] Diffusion-DICE: In-Sample Diffusion Guidance for Offline Reinforcement Learning PDF

[64] DRlinker: deep reinforcement learning for optimization in fragment linking design PDF

[65] Sample Efficient Reinforcement Learning with Partial Dynamics Knowledge PDF

[66] Integrating reaction schemes, reagent databases, and virtual libraries into fragment-based design by reinforcement learning PDF

[67] Guiding Reinforcement Learning with Incomplete System Dynamics PDF

[68] ADHint: Adaptive Hints with Difficulty Priors for Reinforcement Learning PDF

State-of-the-art results for 1.5B-parameter models on math benchmarks

[69] rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking PDF

[37] Swe-rl: Advancing llm reasoning via reinforcement learning on open software evolution PDF

[70] Specializing smaller language models towards multi-step reasoning PDF

[71] Chain of draft: Thinking faster by writing less PDF

[72] Scalediff: Scaling difficult problems for advanced mathematical reasoning PDF

[73] Small models struggle to learn from strong reasoners PDF

[74] LLM performance on mathematical reasoning in Catalan language PDF

[75] Orca 2: Teaching Small Language Models How to Reason PDF

[76] Jiuzhang3. 0: Efficiently improving mathematical reasoning by training small data synthesis models PDF

[77] Tiny-R1V: Lightweight Multimodal Unified Reasoning Model via Model Merging PDF

Table of Contents