RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

Reasoning abstractions; LLM; RL; Structured exploration; Reasoning

Reasoning requires going beyond pattern matching or memorization of solutions to identify and implement algorithmic procedures that can be used to deduce answers to hard problems. Doing so requires reusing primitives, intermediate results, or procedures across multiple problems. While RL post-training on long chains of thought ultimately aims to uncover this kind of algorithmic behavior, the depth-first and brute-force nature of reasoning traces learned by these models suggests that this is far from a fulfilled promise. To address more effective reasoning, we introduce reasoning abstractions: concise natural language descriptions of procedural and factual knowledge that guide the model toward learning successful reasoning. We train models to be capable of proposing several useful abstractions given a problem, followed by RL training that incentivizes building a solution while using the information provided by these abstractions. This results in a two-player RL training paradigm, abbreviated as RLAD, that jointly trains an abstraction generator and an abstraction-conditioned solution generator. This setup effectively enables structured exploration, decouples learning signals of abstraction proposal and solution generation, and improves generalization to harder problems. We also show that spending more test-time compute into generating abstractions is more beneficial for performance than generating more solutions at large inference-time budgets, illustrating the role of abstractions in guiding global exploration.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces RLAD, a two-player reinforcement learning framework that trains models to generate reasoning abstractions—concise natural language descriptions of procedural and factual knowledge—and then use these abstractions to guide solution generation. It resides in the 'Reinforcement Learning for Abstraction Discovery' leaf, which contains only three papers total, including this one. This is a notably sparse research direction within the broader taxonomy of fifty papers, suggesting the specific combination of RL-driven abstraction generation and utilization remains relatively underexplored compared to more crowded areas like chain-of-thought methods or knowledge distillation.

The taxonomy reveals that neighboring leaves address related but distinct challenges: 'Latent and Continuous Reasoning Representations' explores non-linguistic reasoning spaces, 'Step-Back and High-Level Concept Abstraction' focuses on prompting-based derivation of first principles, and 'Tool Creation and Abstract-Concrete Disentanglement' emphasizes creating reusable tools. RLAD bridges these directions by combining explicit natural language abstractions with RL-based discovery, distinguishing itself from purely prompting-based or latent-space approaches. The broader 'Reasoning Abstraction Generation and Utilization' branch contrasts with 'Chain-of-Thought Reasoning Methods' and 'Reinforcement Learning for Solution Generation,' highlighting RLAD's unique focus on structured exploration through abstraction rather than direct solution optimization.

Among twenty-six candidates examined, the contribution-level analysis reveals mixed novelty signals. The core idea of reasoning abstractions as natural language descriptions examined ten candidates and found two potentially refutable prior works, suggesting some overlap with existing abstraction-based methods. The RLAD two-player training paradigm examined six candidates with no clear refutations, indicating this specific RL formulation may be more novel. The abstraction generation method via summarizing solution attempts examined ten candidates without refutations, though the limited search scope means substantial related work could exist beyond the top-K semantic matches analyzed here.

Given the sparse taxonomy leaf and limited literature search scope, RLAD appears to occupy a relatively novel position within RL-driven abstraction discovery, though the first contribution shows some prior work overlap among the candidates examined. The analysis covers top semantic matches and citation expansion but does not constitute an exhaustive field survey, leaving open the possibility of additional related work in adjacent research communities or under different terminologies.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: training language models to generate and utilize reasoning abstractions. The field encompasses diverse approaches to enhancing model reasoning capabilities, organized into several major branches. Chain-of-Thought Reasoning Methods[41] explore explicit step-by-step reasoning processes, while Reasoning Abstraction Generation and Utilization focuses on discovering and leveraging higher-level patterns. Knowledge Transfer and Model Specialization addresses distilling reasoning abilities into smaller models[6][13], and Reinforcement Learning for Solution Generation applies RL techniques to improve reasoning outputs[9][28]. Additional branches cover External Knowledge Integration[27][34], Domain-Specific Reasoning spanning mathematical[7][8] to social contexts[17], Abstract Visual Reasoning Benchmarks[32][35], and foundational work on Evaluation and Analysis[40] alongside Conceptual Frameworks[50]. Multimodal and Cross-Domain Reasoning[4] extends these ideas beyond text. Within Reinforcement Learning for Abstraction Discovery, a small but active cluster explores how models can autonomously identify reusable reasoning patterns. RLAD[0] sits squarely in this space, using RL to discover abstractions that generalize across problem instances. This contrasts with nearby work like Discovering Abstractions[21], which may emphasize different discovery mechanisms, and ProtoReasoning[49], which focuses on prototype-based reasoning structures. The central tension involves balancing the expressiveness of learned abstractions against their computational cost and generalizability. While some approaches like Continuous Latent Reasoning[3] operate in latent spaces for efficiency, others like CREATOR[24] generate explicit tools or abstractions. RLAD[0] navigates this landscape by framing abstraction discovery as a reinforcement learning problem, positioning itself among methods that treat reasoning improvement as an optimization challenge rather than purely supervised distillation or prompting-based elicitation.

Claimed Contributions

Reasoning abstractions as concise natural language descriptions of procedural and factual knowledge

Can Refute

10 retrieved papers

The authors introduce the concept of reasoning abstractions, which are compressed representations of shared procedures underlying multiple candidate solutions to a problem. These abstractions function as hints that enable LLMs to solve harder problems by building on insights, rather than searching over procedural information itself.

10 retrieved papers

Can Refute

RLAD: a two-player RL training paradigm for joint abstraction and solution generation

6 retrieved papers

The authors develop RLAD, a reinforcement learning framework that jointly trains two models: an abstraction generator that proposes reasoning abstractions given a problem, and an abstraction-conditioned solution generator that produces solutions using those abstractions. This setup decouples learning signals and enables structured exploration.

6 retrieved papers

Method for generating abstractions by summarizing solution attempts

10 retrieved papers

The authors propose a method to generate initial reasoning abstractions by collecting diverse solution traces for a problem and prompting a stronger model to summarize useful concepts appearing in these traces. This approach enables models to identify useful substructures within the reasoning graph.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[21] Learning to discover abstractions for llm reasoning PDF

Y Qu, A Singh, Y Lee, A Setlur (2025)

[49] ProtoReasoning: Prototypes as the Foundation for Generalizable Reasoning in LLMs PDF

He Feng, Chen, Zijun, Feng He, Liang, Xinnian, Zijun Chen, Ma Tingting, Xinnian Liang, Tingting Ma, Wu, Shuangzhi, Yunqi Qiu, Yan, Junchi, Shuangzhi Wu, Junchi Yan (2025)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Reasoning abstractions as concise natural language descriptions of procedural and factual knowledge

[55] Boosting Language Models Reasoning with Chain-of-Knowledge Prompting PDF

Can Refute

[57] Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models PDF

Can Refute

[51] Stochastic resonance pathways for latent knowledge reassembly in large language models PDF

Cannot Refute

[52] Improving Factuality and Reasoning in Language Models through Multiagent Debate PDF

Cannot Refute

[53] Speechr: A benchmark for speech reasoning in large audio-language models PDF

Cannot Refute

[54] Rag+: Enhancing retrieval-augmented generation with application-aware reasoning PDF

Cannot Refute

[56] Developing And Assessing Language Models For Logical Reasoning Over Natural Language PDF

Cannot Refute

[58] Out-of-Context abduction: LLMs make inferences about procedural data leveraging declarative facts in earlier training data PDF

Cannot Refute

[59] Transformers as soft reasoners over language PDF

Cannot Refute

[60] Reasoning about Procedures with Natural Language Processing: A Tutorial PDF

Cannot Refute

Contribution

RLAD: a two-player RL training paradigm for joint abstraction and solution generation

[21] Learning to discover abstractions for llm reasoning PDF

Cannot Refute

[70] Symbolic visual reinforcement learning: A scalable framework with object-level abstraction and differentiable expression search PDF

Cannot Refute

[71] Improving automatic source code summarization via deep reinforcement learning PDF

Cannot Refute

[72] Dynamic economic emissions dispatch optimisation using multi-agent reinforcement learning PDF

Cannot Refute

[73] Transfer learning between robots with state abstraction PDF

Cannot Refute

[74] Option Discovery in Hierarchical Reinforcement Learning using Spatio-Temporal Clustering PDF

Cannot Refute

Contribution

Method for generating abstractions by summarizing solution attempts

[5] Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models PDF

Cannot Refute

[61] âWhat it wants me to sayâ: Bridging the abstraction gap between end-user programmers and code-generating large language models PDF

Cannot Refute

[62] XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages PDF

Cannot Refute

[63] Benchmarking Large Language Models for News Summarization PDF

Cannot Refute

[64] A Deep Reinforced Model for Abstractive Summarization PDF

Cannot Refute

[65] Autogen: Enabling next-gen LLM applications via multi-agent conversations PDF

Cannot Refute

[66] Comparing scientific abstracts generated by ChatGPT to real abstracts with detectors and blinded human reviewers PDF

Cannot Refute

[67] Leveraging large language models for abstractive summarization of Italian legal news PDF

Cannot Refute

[68] Generating abstractive summaries with finetuned language models PDF

Cannot Refute

[69] Neural Abstractive Text Summarization with Sequence-to-Sequence Models PDF

Cannot Refute

RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[21] Learning to discover abstractions for llm reasoning PDF

[49] ProtoReasoning: Prototypes as the Foundation for Generalizable Reasoning in LLMs PDF

Contribution Analysis

Reasoning abstractions as concise natural language descriptions of procedural and factual knowledge

[55] Boosting Language Models Reasoning with Chain-of-Knowledge Prompting PDF

[57] Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models PDF

[51] Stochastic resonance pathways for latent knowledge reassembly in large language models PDF

[52] Improving Factuality and Reasoning in Language Models through Multiagent Debate PDF

[53] Speechr: A benchmark for speech reasoning in large audio-language models PDF

[54] Rag+: Enhancing retrieval-augmented generation with application-aware reasoning PDF

[56] Developing And Assessing Language Models For Logical Reasoning Over Natural Language PDF

[58] Out-of-Context abduction: LLMs make inferences about procedural data leveraging declarative facts in earlier training data PDF

[59] Transformers as soft reasoners over language PDF

[60] Reasoning about Procedures with Natural Language Processing: A Tutorial PDF

RLAD: a two-player RL training paradigm for joint abstraction and solution generation

[21] Learning to discover abstractions for llm reasoning PDF

[70] Symbolic visual reinforcement learning: A scalable framework with object-level abstraction and differentiable expression search PDF

[71] Improving automatic source code summarization via deep reinforcement learning PDF

[72] Dynamic economic emissions dispatch optimisation using multi-agent reinforcement learning PDF

[73] Transfer learning between robots with state abstraction PDF

[74] Option Discovery in Hierarchical Reinforcement Learning using Spatio-Temporal Clustering PDF

Method for generating abstractions by summarizing solution attempts

[5] Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models PDF

[61] âWhat it wants me to sayâ: Bridging the abstraction gap between end-user programmers and code-generating large language models PDF

[62] XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages PDF

[63] Benchmarking Large Language Models for News Summarization PDF

[64] A Deep Reinforced Model for Abstractive Summarization PDF

[65] Autogen: Enabling next-gen LLM applications via multi-agent conversations PDF

[66] Comparing scientific abstracts generated by ChatGPT to real abstracts with detectors and blinded human reviewers PDF

[67] Leveraging large language models for abstractive summarization of Italian legal news PDF

[68] Generating abstractive summaries with finetuned language models PDF

[69] Neural Abstractive Text Summarization with Sequence-to-Sequence Models PDF

Table of Contents

[61] âWhat it wants me to sayâ: Bridging the abstraction gap between end-user programmers and code-generating large language models PDF