QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation
Overview
Overall Novelty Assessment
The paper proposes QuestA, a question augmentation strategy that introduces partial solutions during RL training to reduce problem difficulty and improve learning signals for mathematical reasoning. It sits within the Mathematical Reasoning leaf of the taxonomy, which contains only three papers total, indicating a relatively focused but not overcrowded research direction. The sibling papers—WizardMath and Deeptheorem—similarly target RL-based mathematical problem solving, suggesting this work occupies a well-defined niche within domain-specific applications of RL for reasoning enhancement.
The taxonomy reveals that Mathematical Reasoning is one subcategory under Domain-Specific Applications, alongside Software Engineering, Multimodal Reasoning, and Specialized Applications. Neighboring branches include Core RL Algorithms (policy optimization, reward design) and Reasoning Paradigms (chain-of-thought, search integration). QuestA's focus on question augmentation connects it to Training Strategies and Optimization under Core RL Algorithms, as partial solutions can be viewed as a curriculum learning or data augmentation technique. The taxonomy's scope notes clarify that domain-specific methods like QuestA are distinguished from general-purpose algorithm design, positioning this work as an application-driven adaptation rather than a foundational RL innovation.
Among 29 candidates examined, the contribution-level analysis reveals mixed novelty signals. The QuestA method itself examined 9 candidates with 1 refutable match, suggesting some prior work on question augmentation or partial-solution strategies exists within the limited search scope. The theoretical analysis examined 10 candidates with 2 refutable matches, indicating that benefits of partial-solution augmentation may have been explored previously. The state-of-the-art results contribution examined 10 candidates with 1 refutable match, implying that performance claims on these benchmarks face some prior competition. These statistics reflect a top-K semantic search, not an exhaustive literature review, so the presence of refutable candidates indicates overlap within the examined subset rather than definitive lack of novelty.
Given the limited search scope of 29 candidates, the analysis suggests QuestA introduces a focused adaptation of RL training for mathematical reasoning, with some overlap in each contribution area among the examined papers. The work appears to refine existing ideas—question augmentation, partial solutions, and benchmark performance—rather than introduce entirely unprecedented concepts. However, the sparse Mathematical Reasoning leaf and the specific combination of techniques may still offer incremental value. A broader literature search would be needed to assess whether the integration of these elements constitutes a meaningful advance over the field's current state.
Taxonomy
Research Landscape Overview
Claimed Contributions
The authors introduce QuestA, a data augmentation approach that prepends partial solutions to hard reasoning problems during reinforcement learning training. This method scaffolds difficult problems by revealing intermediate steps, making them more tractable while providing denser reward signals for more efficient RL training.
The paper provides formal theoretical justification showing that augmenting questions with partial solutions (hints) improves RL sample efficiency. The analysis demonstrates that hints enable the model to discover valid trajectories with asymptotically lower sampling budget compared to training without hints.
The authors demonstrate that applying QuestA to small-scale models (1.5B parameters) achieves new state-of-the-art performance on challenging mathematical reasoning benchmarks, substantially outperforming existing models of similar size and even matching or exceeding much larger 32B-parameter models.
Core Task Comparisons
Comparisons with papers in the same taxonomy category
[18] WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct PDF
[48] Deeptheorem: Advancing llm reasoning for theorem proving through natural language and reinforcement learning PDF
Contribution Analysis
Detailed comparisons for each claimed contribution
QuestA method via question augmentation with partial solutions
The authors introduce QuestA, a data augmentation approach that prepends partial solutions to hard reasoning problems during reinforcement learning training. This method scaffolds difficult problems by revealing intermediate steps, making them more tractable while providing denser reward signals for more efficient RL training.
[58] From Data-Centric to Sample-Centric: Enhancing LLM Reasoning via Progressive Optimization PDF
[51] Using incomplete and incorrect plans to shape reinforcement learning in long-sequence sparse-reward tasks PDF
[52] Reinforcement learning with dynamic completion for answering multi-hop questions over incomplete knowledge graph PDF
[53] Promed: Shapley information gain guided reinforcement learning for proactive medical llms PDF
[54] Enhancing policy gradient for traveling salesman problem with data augmented behavior cloning PDF
[55] Mixture of Autoencoder Experts Guidance using Unlabeled and Incomplete Data for Exploration in Reinforcement Learning PDF
[56] Generative question refinement with deep reinforcement learning in retrieval-based QA system PDF
[57] OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning PDF
[59] Converting Natural Language to Query Languages Using Large Language Models: A Systematic Literature Review PDF
Theoretical analysis of partial-solution augmentation benefits
The paper provides formal theoretical justification showing that augmenting questions with partial solutions (hints) improves RL sample efficiency. The analysis demonstrates that hints enable the model to discover valid trajectories with asymptotically lower sampling budget compared to training without hints.
[60] StepHint: Multi-level Stepwise Hints Enhance Reinforcement Learning to Reason PDF
[61] Adaptive Guidance Accelerates Reinforcement Learning of Reasoning Models PDF
[58] From Data-Centric to Sample-Centric: Enhancing LLM Reasoning via Progressive Optimization PDF
[62] Scalable fragment-based 3d molecular design with reinforcement learning PDF
[63] Diffusion-DICE: In-Sample Diffusion Guidance for Offline Reinforcement Learning PDF
[64] DRlinker: deep reinforcement learning for optimization in fragment linking design PDF
[65] Sample Efficient Reinforcement Learning with Partial Dynamics Knowledge PDF
[66] Integrating reaction schemes, reagent databases, and virtual libraries into fragment-based design by reinforcement learning PDF
[67] Guiding Reinforcement Learning with Incomplete System Dynamics PDF
[68] ADHint: Adaptive Hints with Difficulty Priors for Reinforcement Learning PDF
State-of-the-art results for 1.5B-parameter models on math benchmarks
The authors demonstrate that applying QuestA to small-scale models (1.5B parameters) achieves new state-of-the-art performance on challenging mathematical reasoning benchmarks, substantially outperforming existing models of similar size and even matching or exceeding much larger 32B-parameter models.