ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory

ICLR 2026 Conference SubmissionAnonymous Authors
LLM AgentsMemory MechanismReasoningTest-Time Scaling
Abstract:

With the growing adoption of large language model (LLM) agents in persistent, real-world roles, they naturally encounter continuous streams of tasks and interactions. A key limitation, however, is their failure to learn from this accumulated experience, forcing them to discard valuable insights and repeat past errors. Unlike prior works that primarily store raw experience or successful routines, we propose ReasoningBank, a novel memory framework that allows an agent to self-curate generalizable reasoning strategies from both its successful and failed experiences for future leverage. This mechanism enables agents to generalize across tasks and become more capable over time. To accelerate and diversify this test-time learning process, we further propose memory-aware test-time scaling (MaTTS), which leverages a powerful synergy between memory and test-time scaling. On one hand, relevant memory from ReasoningBank guides the scaling process toward more effective exploration and improved reliability. On the other, scaling, in both parallel and sequential settings, generates abundant, diverse experiences that provide rich contrastive signals for synthesizing higher-quality memory. Experiments on web browsing and software engineering tasks show that ReasoningBank consistently outperforms existing memory mechanisms in both effectiveness and efficiency, with MaTTS further amplifying the gains. These findings position memory-driven experience as a new dimension of test-time scaling, where emergent behaviors naturally arise and agents acquire self-evolving capabilities.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes ReasoningBank, a memory framework enabling agents to self-curate generalizable reasoning strategies from both successful and failed experiences, and introduces memory-aware test-time scaling (MaTTS) to accelerate learning. Within the taxonomy, it resides in the 'Reflection and Verbal Reinforcement Learning' leaf under 'Memory-Based Learning Mechanisms', alongside three sibling papers. This leaf represents a moderately populated research direction focused on linguistic feedback and self-reflection without parametric updates, situated within a broader branch containing four distinct learning paradigms across the field's 50 papers.

The taxonomy reveals neighboring research directions that contextualize this work's positioning. Adjacent leaves include 'Self-Evolving and Lifelong Learning Agents' emphasizing continuous capability improvement, 'Experience Replay and Trajectory Synthesis' leveraging stored trajectories for sample efficiency, and 'Cross-Domain Experience Sharing' enabling knowledge transfer across tasks. The scope note for the paper's leaf explicitly excludes parametric weight updates, distinguishing reflection-based approaches from gradient-driven methods. This boundary clarifies that ReasoningBank operates through memory curation rather than model fine-tuning, connecting it to verbal reinforcement paradigms while diverging from replay-based learning mechanisms in neighboring leaves.

Among 27 candidates examined across three contributions, the ReasoningBank framework shows one refutable candidate out of 10 examined, suggesting some overlap with prior memory architectures. The MaTTS contribution examined 7 candidates with none refutable, indicating relatively sparser prior work on memory-guided test-time scaling. The third contribution on memory-driven experience as a scaling dimension examined 10 candidates with none refutable, suggesting this framing may be less explored. The limited search scope means these statistics reflect top semantic matches rather than exhaustive coverage, and the single refutable case for ReasoningBank indicates at least one prior work addresses similar memory curation concepts within the examined set.

Based on the top-27 semantic matches examined, the work appears to introduce novel combinations of memory curation with test-time scaling, though the ReasoningBank framework itself shows some overlap with existing memory architectures. The analysis covers semantically proximate papers but does not guarantee exhaustive field coverage, particularly for works using different terminology or published in specialized venues. The taxonomy positioning suggests the paper bridges reflection-based learning with scaling paradigms, occupying a moderately explored niche within the broader agent learning landscape.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
27
Contribution Candidate Papers Compared
1
Refutable Paper

Research Landscape Overview

Core task: agent learning from experience through reasoning memory. This field explores how agents accumulate, organize, and leverage past experiences to improve decision-making and reasoning over time. The taxonomy reveals a multifaceted landscape organized into seven main branches. Memory Architecture and Representation addresses how agents structure and encode experiential knowledge, ranging from episodic traces to knowledge graphs. Memory-Based Learning Mechanisms focuses on how agents extract lessons from stored experiences, including reflection-driven approaches and verbal reinforcement paradigms. Retrieval and Memory Operations examines the computational processes for accessing relevant past information, while Reasoning and Planning with Memory investigates how agents integrate retrieved experiences into forward-looking decision processes. Cognitive and Neuroscience-Inspired Frameworks draw on biological memory systems to inform agent design, and Domain-Specific Applications and Embodied Agents apply these principles to robotics, navigation, and interactive environments. Finally, Surveys and Theoretical Foundations provide overarching perspectives on the field's conceptual underpinnings. Within Memory-Based Learning Mechanisms, a particularly active line of work centers on reflection and verbal reinforcement learning, where agents iteratively refine their behavior by generating natural language critiques of past actions. Reflexion[1] pioneered this direction by enabling agents to self-reflect on task failures and adjust strategies accordingly, while more recent efforts like R2D2[33] and MetaReflection[34] extend these ideas to multi-step reasoning and meta-level introspection. ReasoningBank[0] situates itself within this cluster by emphasizing the construction of a structured memory bank that captures reasoning traces and supports iterative learning from experience. Compared to Reflexion[1], which focuses on immediate self-correction, ReasoningBank[0] appears to prioritize the accumulation and reuse of reasoning patterns across episodes, aligning closely with works like Self-Evolving Agents[3] that explore long-term knowledge consolidation. This branch highlights an ongoing tension between lightweight, episode-specific reflection and more persistent, architecturally integrated memory systems that scale across diverse tasks.

Claimed Contributions

ReasoningBank memory framework

A memory framework that distills high-level reasoning strategies from both successful and failed agent experiences into structured, reusable memory items (with title, description, and content), enabling agents to generalize across tasks and evolve over time rather than storing only raw trajectories or successful routines.

10 retrieved papers
Can Refute
Memory-aware test-time scaling (MaTTS)

A test-time scaling approach that creates bidirectional synergy between memory and scaling: memory guides scaling toward more promising explorations, while diverse rollouts from scaling provide rich contrastive signals for higher-quality memory curation in both parallel and sequential settings.

7 retrieved papers
Memory-driven experience as a new scaling dimension

The work establishes memory-driven experience as a novel dimension for test-time scaling in agent systems, demonstrating that agents can develop increasingly complex emergent reasoning strategies and self-evolving capabilities through the interaction of memory and scaling.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

ReasoningBank memory framework

A memory framework that distills high-level reasoning strategies from both successful and failed agent experiences into structured, reusable memory items (with title, description, and content), enabling agents to generalize across tasks and evolve over time rather than storing only raw trajectories or successful routines.

Contribution

Memory-aware test-time scaling (MaTTS)

A test-time scaling approach that creates bidirectional synergy between memory and scaling: memory guides scaling toward more promising explorations, while diverse rollouts from scaling provide rich contrastive signals for higher-quality memory curation in both parallel and sequential settings.

Contribution

Memory-driven experience as a new scaling dimension

The work establishes memory-driven experience as a novel dimension for test-time scaling in agent systems, demonstrating that agents can develop increasingly complex emergent reasoning strategies and self-evolving capabilities through the interaction of memory and scaling.

ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory | Novelty Validation