RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems

ICLR 2026 Conference SubmissionAnonymous Authors
Reasoning abstractions; LLM; RL; Structured exploration; Reasoning
Abstract:

Reasoning requires going beyond pattern matching or memorization of solutions to identify and implement algorithmic procedures that can be used to deduce answers to hard problems. Doing so requires reusing primitives, intermediate results, or procedures across multiple problems. While RL post-training on long chains of thought ultimately aims to uncover this kind of algorithmic behavior, the depth-first and brute-force nature of reasoning traces learned by these models suggests that this is far from a fulfilled promise. To address more effective reasoning, we introduce reasoning abstractions: concise natural language descriptions of procedural and factual knowledge that guide the model toward learning successful reasoning. We train models to be capable of proposing several useful abstractions given a problem, followed by RL training that incentivizes building a solution while using the information provided by these abstractions. This results in a two-player RL training paradigm, abbreviated as RLAD, that jointly trains an abstraction generator and an abstraction-conditioned solution generator. This setup effectively enables structured exploration, decouples learning signals of abstraction proposal and solution generation, and improves generalization to harder problems. We also show that spending more test-time compute into generating abstractions is more beneficial for performance than generating more solutions at large inference-time budgets, illustrating the role of abstractions in guiding global exploration.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces RLAD, a two-player reinforcement learning framework that trains models to generate reasoning abstractions—concise natural language descriptions of procedural and factual knowledge—and then use these abstractions to guide solution generation. It resides in the 'Reinforcement Learning for Abstraction Discovery' leaf, which contains only three papers total, including this one. This is a notably sparse research direction within the broader taxonomy of fifty papers, suggesting the specific combination of RL-driven abstraction generation and utilization remains relatively underexplored compared to more crowded areas like chain-of-thought methods or knowledge distillation.

The taxonomy reveals that neighboring leaves address related but distinct challenges: 'Latent and Continuous Reasoning Representations' explores non-linguistic reasoning spaces, 'Step-Back and High-Level Concept Abstraction' focuses on prompting-based derivation of first principles, and 'Tool Creation and Abstract-Concrete Disentanglement' emphasizes creating reusable tools. RLAD bridges these directions by combining explicit natural language abstractions with RL-based discovery, distinguishing itself from purely prompting-based or latent-space approaches. The broader 'Reasoning Abstraction Generation and Utilization' branch contrasts with 'Chain-of-Thought Reasoning Methods' and 'Reinforcement Learning for Solution Generation,' highlighting RLAD's unique focus on structured exploration through abstraction rather than direct solution optimization.

Among twenty-six candidates examined, the contribution-level analysis reveals mixed novelty signals. The core idea of reasoning abstractions as natural language descriptions examined ten candidates and found two potentially refutable prior works, suggesting some overlap with existing abstraction-based methods. The RLAD two-player training paradigm examined six candidates with no clear refutations, indicating this specific RL formulation may be more novel. The abstraction generation method via summarizing solution attempts examined ten candidates without refutations, though the limited search scope means substantial related work could exist beyond the top-K semantic matches analyzed here.

Given the sparse taxonomy leaf and limited literature search scope, RLAD appears to occupy a relatively novel position within RL-driven abstraction discovery, though the first contribution shows some prior work overlap among the candidates examined. The analysis covers top semantic matches and citation expansion but does not constitute an exhaustive field survey, leaving open the possibility of additional related work in adjacent research communities or under different terminologies.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
26
Contribution Candidate Papers Compared
2
Refutable Paper

Research Landscape Overview

Core task: training language models to generate and utilize reasoning abstractions. The field encompasses diverse approaches to enhancing model reasoning capabilities, organized into several major branches. Chain-of-Thought Reasoning Methods[41] explore explicit step-by-step reasoning processes, while Reasoning Abstraction Generation and Utilization focuses on discovering and leveraging higher-level patterns. Knowledge Transfer and Model Specialization addresses distilling reasoning abilities into smaller models[6][13], and Reinforcement Learning for Solution Generation applies RL techniques to improve reasoning outputs[9][28]. Additional branches cover External Knowledge Integration[27][34], Domain-Specific Reasoning spanning mathematical[7][8] to social contexts[17], Abstract Visual Reasoning Benchmarks[32][35], and foundational work on Evaluation and Analysis[40] alongside Conceptual Frameworks[50]. Multimodal and Cross-Domain Reasoning[4] extends these ideas beyond text. Within Reinforcement Learning for Abstraction Discovery, a small but active cluster explores how models can autonomously identify reusable reasoning patterns. RLAD[0] sits squarely in this space, using RL to discover abstractions that generalize across problem instances. This contrasts with nearby work like Discovering Abstractions[21], which may emphasize different discovery mechanisms, and ProtoReasoning[49], which focuses on prototype-based reasoning structures. The central tension involves balancing the expressiveness of learned abstractions against their computational cost and generalizability. While some approaches like Continuous Latent Reasoning[3] operate in latent spaces for efficiency, others like CREATOR[24] generate explicit tools or abstractions. RLAD[0] navigates this landscape by framing abstraction discovery as a reinforcement learning problem, positioning itself among methods that treat reasoning improvement as an optimization challenge rather than purely supervised distillation or prompting-based elicitation.

Claimed Contributions

Reasoning abstractions as concise natural language descriptions of procedural and factual knowledge

The authors introduce the concept of reasoning abstractions, which are compressed representations of shared procedures underlying multiple candidate solutions to a problem. These abstractions function as hints that enable LLMs to solve harder problems by building on insights, rather than searching over procedural information itself.

10 retrieved papers
Can Refute
RLAD: a two-player RL training paradigm for joint abstraction and solution generation

The authors develop RLAD, a reinforcement learning framework that jointly trains two models: an abstraction generator that proposes reasoning abstractions given a problem, and an abstraction-conditioned solution generator that produces solutions using those abstractions. This setup decouples learning signals and enables structured exploration.

6 retrieved papers
Method for generating abstractions by summarizing solution attempts

The authors propose a method to generate initial reasoning abstractions by collecting diverse solution traces for a problem and prompting a stronger model to summarize useful concepts appearing in these traces. This approach enables models to identify useful substructures within the reasoning graph.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Reasoning abstractions as concise natural language descriptions of procedural and factual knowledge

The authors introduce the concept of reasoning abstractions, which are compressed representations of shared procedures underlying multiple candidate solutions to a problem. These abstractions function as hints that enable LLMs to solve harder problems by building on insights, rather than searching over procedural information itself.

Contribution

RLAD: a two-player RL training paradigm for joint abstraction and solution generation

The authors develop RLAD, a reinforcement learning framework that jointly trains two models: an abstraction generator that proposes reasoning abstractions given a problem, and an abstraction-conditioned solution generator that produces solutions using those abstractions. This setup decouples learning signals and enables structured exploration.

Contribution

Method for generating abstractions by summarizing solution attempts

The authors propose a method to generate initial reasoning abstractions by collecting diverse solution traces for a problem and prompting a stronger model to summarize useful concepts appearing in these traces. This approach enables models to identify useful substructures within the reasoning graph.