QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation

ICLR 2026 Conference SubmissionAnonymous Authors
Large Language Model; Reinforcement Learning
Abstract:

Reinforcement learning (RL) has emerged as a central paradigm for training large language models (LLMs) in reasoning tasks. Yet recent studies question RL’s ability to incentivize reasoning capacity beyond the base model. This raises a key challenge: how can RL be adapted to solve harder reasoning problems more effectively? To address this challenge, we propose a simple yet effective strategy via Question Augmentation: introduce partial solutions during training to reduce problem difficulty and provide more informative learning signals. Our method, QuestA, when applied during RL training on math reasoning tasks, not only improves pass@1 but also pass@k—particularly on problems where standard RL struggles to make progress. This enables continual improvement over strong open-source models such as DeepScaleR and OpenMath Nemotron, further enhancing their reasoning capabilities. We achieve new state-of-the-art results on math benchmarks using 1.5B-parameter models: 72.50% (+10.73%) on AIME24, 62.29% (+12.79%) on AIME25, and 41.67% (+10.11%) on HMMT25. Code, data and model are available at https://anonymous.4open.science/r/questa932.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes QuestA, a question augmentation strategy that introduces partial solutions during RL training to reduce problem difficulty and improve learning signals for mathematical reasoning. It sits within the Mathematical Reasoning leaf of the taxonomy, which contains only three papers total, indicating a relatively focused but not overcrowded research direction. The sibling papers—WizardMath and Deeptheorem—similarly target RL-based mathematical problem solving, suggesting this work occupies a well-defined niche within domain-specific applications of RL for reasoning enhancement.

The taxonomy reveals that Mathematical Reasoning is one subcategory under Domain-Specific Applications, alongside Software Engineering, Multimodal Reasoning, and Specialized Applications. Neighboring branches include Core RL Algorithms (policy optimization, reward design) and Reasoning Paradigms (chain-of-thought, search integration). QuestA's focus on question augmentation connects it to Training Strategies and Optimization under Core RL Algorithms, as partial solutions can be viewed as a curriculum learning or data augmentation technique. The taxonomy's scope notes clarify that domain-specific methods like QuestA are distinguished from general-purpose algorithm design, positioning this work as an application-driven adaptation rather than a foundational RL innovation.

Among 29 candidates examined, the contribution-level analysis reveals mixed novelty signals. The QuestA method itself examined 9 candidates with 1 refutable match, suggesting some prior work on question augmentation or partial-solution strategies exists within the limited search scope. The theoretical analysis examined 10 candidates with 2 refutable matches, indicating that benefits of partial-solution augmentation may have been explored previously. The state-of-the-art results contribution examined 10 candidates with 1 refutable match, implying that performance claims on these benchmarks face some prior competition. These statistics reflect a top-K semantic search, not an exhaustive literature review, so the presence of refutable candidates indicates overlap within the examined subset rather than definitive lack of novelty.

Given the limited search scope of 29 candidates, the analysis suggests QuestA introduces a focused adaptation of RL training for mathematical reasoning, with some overlap in each contribution area among the examined papers. The work appears to refine existing ideas—question augmentation, partial solutions, and benchmark performance—rather than introduce entirely unprecedented concepts. However, the sparse Mathematical Reasoning leaf and the specific combination of techniques may still offer incremental value. A broader literature search would be needed to assess whether the integration of these elements constitutes a meaningful advance over the field's current state.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
29
Contribution Candidate Papers Compared
4
Refutable Paper

Research Landscape Overview

Core task: Enhancing reasoning capacity in large language models through reinforcement learning. The field has organized itself into several major branches that reflect both methodological diversity and application focus. At the highest level, researchers distinguish between core RL algorithms and training methods—where foundational techniques such as policy gradient variants and reward modeling are developed—and empirical analyses that systematically evaluate how RL shapes reasoning behavior. Parallel branches address reasoning paradigms and architectures (exploring chain-of-thought, search-based inference, and latent reasoning structures), domain-specific applications (including mathematical reasoning, code generation, and interactive agents), and resource-constrained methods that prioritize efficiency. Additional branches cover alternative paradigms (such as diffusion-based or non-standard architectures), external knowledge integration, and survey papers that synthesize emerging trends. Works like Deepseek-r1[3] and Large Reasoning Models[8] illustrate how core RL techniques scale to complex reasoning tasks, while WizardMath[18] and Deeptheorem[48] exemplify domain-specific mathematical applications. Within this landscape, mathematical reasoning has emerged as a particularly active testbed, drawing on both supervised fine-tuning pipelines and RL-driven exploration to handle multi-step problem solving. Many studies in this branch investigate trade-offs between sample efficiency, verifiability of intermediate steps, and the balance between exploration and exploitation during training. QuestA[0] situates itself squarely in this mathematical reasoning cluster, emphasizing RL-based enhancement of reasoning chains for question-answering tasks. Its approach aligns closely with neighbors like WizardMath[18], which also targets mathematical problem solving, and Deeptheorem[48], which extends RL methods to formal theorem proving. Compared to these works, QuestA[0] appears to focus on refining the reward signal and training dynamics specific to question-driven reasoning, contributing to ongoing efforts to make RL more effective and interpretable in structured mathematical domains.

Claimed Contributions

QuestA method via question augmentation with partial solutions

The authors introduce QuestA, a data augmentation approach that prepends partial solutions to hard reasoning problems during reinforcement learning training. This method scaffolds difficult problems by revealing intermediate steps, making them more tractable while providing denser reward signals for more efficient RL training.

9 retrieved papers
Can Refute
Theoretical analysis of partial-solution augmentation benefits

The paper provides formal theoretical justification showing that augmenting questions with partial solutions (hints) improves RL sample efficiency. The analysis demonstrates that hints enable the model to discover valid trajectories with asymptotically lower sampling budget compared to training without hints.

10 retrieved papers
Can Refute
State-of-the-art results for 1.5B-parameter models on math benchmarks

The authors demonstrate that applying QuestA to small-scale models (1.5B parameters) achieves new state-of-the-art performance on challenging mathematical reasoning benchmarks, substantially outperforming existing models of similar size and even matching or exceeding much larger 32B-parameter models.

10 retrieved papers
Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

QuestA method via question augmentation with partial solutions

The authors introduce QuestA, a data augmentation approach that prepends partial solutions to hard reasoning problems during reinforcement learning training. This method scaffolds difficult problems by revealing intermediate steps, making them more tractable while providing denser reward signals for more efficient RL training.

Contribution

Theoretical analysis of partial-solution augmentation benefits

The paper provides formal theoretical justification showing that augmenting questions with partial solutions (hints) improves RL sample efficiency. The analysis demonstrates that hints enable the model to discover valid trajectories with asymptotically lower sampling budget compared to training without hints.

Contribution

State-of-the-art results for 1.5B-parameter models on math benchmarks

The authors demonstrate that applying QuestA to small-scale models (1.5B parameters) achieves new state-of-the-art performance on challenging mathematical reasoning benchmarks, substantially outperforming existing models of similar size and even matching or exceeding much larger 32B-parameter models.