ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.5 Download Report PDF

LLM AgentsMemory MechanismReasoningTest-Time Scaling

With the growing adoption of large language model (LLM) agents in persistent, real-world roles, they naturally encounter continuous streams of tasks and interactions. A key limitation, however, is their failure to learn from this accumulated experience, forcing them to discard valuable insights and repeat past errors. Unlike prior works that primarily store raw experience or successful routines, we propose ReasoningBank, a novel memory framework that allows an agent to self-curate generalizable reasoning strategies from both its successful and failed experiences for future leverage. This mechanism enables agents to generalize across tasks and become more capable over time. To accelerate and diversify this test-time learning process, we further propose memory-aware test-time scaling (MaTTS), which leverages a powerful synergy between memory and test-time scaling. On one hand, relevant memory from ReasoningBank guides the scaling process toward more effective exploration and improved reliability. On the other, scaling, in both parallel and sequential settings, generates abundant, diverse experiences that provide rich contrastive signals for synthesizing higher-quality memory. Experiments on web browsing and software engineering tasks show that ReasoningBank consistently outperforms existing memory mechanisms in both effectiveness and efficiency, with MaTTS further amplifying the gains. These findings position memory-driven experience as a new dimension of test-time scaling, where emergent behaviors naturally arise and agents acquire self-evolving capabilities.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes ReasoningBank, a memory framework enabling agents to self-curate generalizable reasoning strategies from both successful and failed experiences, and introduces memory-aware test-time scaling (MaTTS) to accelerate learning. Within the taxonomy, it resides in the 'Reflection and Verbal Reinforcement Learning' leaf under 'Memory-Based Learning Mechanisms', alongside three sibling papers. This leaf represents a moderately populated research direction focused on linguistic feedback and self-reflection without parametric updates, situated within a broader branch containing four distinct learning paradigms across the field's 50 papers.

The taxonomy reveals neighboring research directions that contextualize this work's positioning. Adjacent leaves include 'Self-Evolving and Lifelong Learning Agents' emphasizing continuous capability improvement, 'Experience Replay and Trajectory Synthesis' leveraging stored trajectories for sample efficiency, and 'Cross-Domain Experience Sharing' enabling knowledge transfer across tasks. The scope note for the paper's leaf explicitly excludes parametric weight updates, distinguishing reflection-based approaches from gradient-driven methods. This boundary clarifies that ReasoningBank operates through memory curation rather than model fine-tuning, connecting it to verbal reinforcement paradigms while diverging from replay-based learning mechanisms in neighboring leaves.

Among 27 candidates examined across three contributions, the ReasoningBank framework shows one refutable candidate out of 10 examined, suggesting some overlap with prior memory architectures. The MaTTS contribution examined 7 candidates with none refutable, indicating relatively sparser prior work on memory-guided test-time scaling. The third contribution on memory-driven experience as a scaling dimension examined 10 candidates with none refutable, suggesting this framing may be less explored. The limited search scope means these statistics reflect top semantic matches rather than exhaustive coverage, and the single refutable case for ReasoningBank indicates at least one prior work addresses similar memory curation concepts within the examined set.

Based on the top-27 semantic matches examined, the work appears to introduce novel combinations of memory curation with test-time scaling, though the ReasoningBank framework itself shows some overlap with existing memory architectures. The analysis covers semantically proximate papers but does not guarantee exhaustive field coverage, particularly for works using different terminology or published in specialized venues. The taxonomy positioning suggests the paper bridges reflection-based learning with scaling paradigms, occupying a moderately explored niche within the broader agent learning landscape.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: agent learning from experience through reasoning memory. This field explores how agents accumulate, organize, and leverage past experiences to improve decision-making and reasoning over time. The taxonomy reveals a multifaceted landscape organized into seven main branches. Memory Architecture and Representation addresses how agents structure and encode experiential knowledge, ranging from episodic traces to knowledge graphs. Memory-Based Learning Mechanisms focuses on how agents extract lessons from stored experiences, including reflection-driven approaches and verbal reinforcement paradigms. Retrieval and Memory Operations examines the computational processes for accessing relevant past information, while Reasoning and Planning with Memory investigates how agents integrate retrieved experiences into forward-looking decision processes. Cognitive and Neuroscience-Inspired Frameworks draw on biological memory systems to inform agent design, and Domain-Specific Applications and Embodied Agents apply these principles to robotics, navigation, and interactive environments. Finally, Surveys and Theoretical Foundations provide overarching perspectives on the field's conceptual underpinnings. Within Memory-Based Learning Mechanisms, a particularly active line of work centers on reflection and verbal reinforcement learning, where agents iteratively refine their behavior by generating natural language critiques of past actions. Reflexion[1] pioneered this direction by enabling agents to self-reflect on task failures and adjust strategies accordingly, while more recent efforts like R2D2[33] and MetaReflection[34] extend these ideas to multi-step reasoning and meta-level introspection. ReasoningBank[0] situates itself within this cluster by emphasizing the construction of a structured memory bank that captures reasoning traces and supports iterative learning from experience. Compared to Reflexion[1], which focuses on immediate self-correction, ReasoningBank[0] appears to prioritize the accumulation and reuse of reasoning patterns across episodes, aligning closely with works like Self-Evolving Agents[3] that explore long-term knowledge consolidation. This branch highlights an ongoing tension between lightweight, episode-specific reflection and more persistent, architecturally integrated memory systems that scale across diverse tasks.

Claimed Contributions

ReasoningBank memory framework

Can Refute

10 retrieved papers

A memory framework that distills high-level reasoning strategies from both successful and failed agent experiences into structured, reusable memory items (with title, description, and content), enabling agents to generalize across tasks and evolve over time rather than storing only raw trajectories or successful routines.

10 retrieved papers

Can Refute

Memory-aware test-time scaling (MaTTS)

7 retrieved papers

A test-time scaling approach that creates bidirectional synergy between memory and scaling: memory guides scaling toward more promising explorations, while diverse rollouts from scaling provide rich contrastive signals for higher-quality memory curation in both parallel and sequential settings.

7 retrieved papers

Memory-driven experience as a new scaling dimension

10 retrieved papers

The work establishes memory-driven experience as a novel dimension for test-time scaling in agent systems, demonstrating that agents can develop increasingly complex emergent reasoning strategies and self-evolving capabilities through the interaction of memory and scaling.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[1] Reflexion: language agents with verbal reinforcement learning PDF

Shinn, Noah, Cassano, Federico, Noah Shinn, Edward Berman, Federico Cassano, Gopinath, Ashwin, Beck Labash, Narasimhan, Karthik, A. Gopinath, Yao, Shunyu, Karthik Narasimhan, Shunyu Yao (2023) • Neural Information Processing Systems

[33] R2D2: Remembering, Replaying and Dynamic Decision Making with a Reflective Agentic Memory PDF

Tenghao Huang, Kinjal Basu, Ibrahim Abdelaziz, Pavan Kapanipathi, Jonathan May, P. Kapanipathi, Muhao Chen (2025)

[34] MetaReflection: Learning Instructions for Language Agents using Past Reflections PDF

Gupta, Priyanshu, Kirtania, Shashank, Shi, Sherry, Gulwani, Sumit, Radhakrishna, Arjun, Soares, Gustavo (2024) • Conference on Empirical Methods in Natural Language Processing

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

ReasoningBank memory framework

[61] From Experience to Strategy: Empowering LLM Agents with Trainable Graph Memory PDF

Can Refute

[2] Xolver: Multi-Agent Reasoning with Holistic Experience Learning Just Like an Olympiad Team PDF

Cannot Refute

[18] Agent kb: Leveraging cross-domain experience for agentic problem solving PDF

Cannot Refute

[58] Curating Demonstrations using Online Experience PDF

Cannot Refute

[59] MAPLE: Multi-Agent Adaptive Planning with Long-Term Memory for Table Reasoning PDF

Cannot Refute

[60] SiriuS: Self-improving Multi-agent Systems via Bootstrapped Reasoning PDF

Cannot Refute

[62] Reasoning, Memorization, and Fine-Tuning Language Models for Non-Cooperative Games PDF

Cannot Refute

[63] AgentEvolver: Towards Efficient Self-Evolving Agent System PDF

Cannot Refute

[64] Table-critic: A multi-agent framework for collaborative criticism and refinement in table reasoning PDF

Cannot Refute

[65] GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based VLM Agent Training PDF

Cannot Refute

Contribution

Memory-aware test-time scaling (MaTTS)

[51] Incremental model enhancement via memory-based contrastive learning PDF

Cannot Refute

[52] Demystifying Diffusion Policies: Action Memorization and Simple Lookup Table Alternatives PDF

Cannot Refute

[53] BECLR: Batch Enhanced Contrastive Few-Shot Learning PDF

Cannot Refute

[54] Style-Adaptive Detection Transformer for Single-Source Domain Generalized Object Detection PDF

Cannot Refute

[55] Multi-camera spatiotemporal deep learning framework for real-time abnormal behavior detection in dense urban environments PDF

Cannot Refute

[56] SoftCoT++: Test-Time Scaling with Soft Chain-of-Thought Reasoning PDF

Cannot Refute

[57] Contrastive Test-Time Adaptation PDF

Cannot Refute

Contribution

Memory-driven experience as a new scaling dimension

[13] SAGE: Self-evolving Agents with Reflective and Memory-augmented Abilities PDF

Cannot Refute

[66] Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory PDF

Cannot Refute

[67] MetaAgent: Toward Self-Evolving Agent via Tool Meta-Learning PDF

Cannot Refute

[68] A survey of self-evolving agents: On path to artificial super intelligence PDF

Cannot Refute

[69] G-Memory: Tracing Hierarchical Memory for Multi-Agent Systems PDF

Cannot Refute

[70] Agentnet: Decentralized evolutionary coordination for llm-based multi-agent systems PDF

Cannot Refute

[71] Alita-g: Self-evolving generative agent for agent generation PDF

Cannot Refute

[72] Memory Sharing for Large Language Model based Agents PDF

Cannot Refute

[73] Chemagent: Self-updating memories in large language models improves chemical reasoning PDF

Cannot Refute

[74] VLM Can Be a Good Assistant: Enhancing Embodied Visual Tracking with Self-Improving Vision-Language Models PDF

Cannot Refute

ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[1] Reflexion: language agents with verbal reinforcement learning PDF

[33] R2D2: Remembering, Replaying and Dynamic Decision Making with a Reflective Agentic Memory PDF

[34] MetaReflection: Learning Instructions for Language Agents using Past Reflections PDF

Contribution Analysis

ReasoningBank memory framework

[61] From Experience to Strategy: Empowering LLM Agents with Trainable Graph Memory PDF

[2] Xolver: Multi-Agent Reasoning with Holistic Experience Learning Just Like an Olympiad Team PDF

[18] Agent kb: Leveraging cross-domain experience for agentic problem solving PDF

[58] Curating Demonstrations using Online Experience PDF

[59] MAPLE: Multi-Agent Adaptive Planning with Long-Term Memory for Table Reasoning PDF

[60] SiriuS: Self-improving Multi-agent Systems via Bootstrapped Reasoning PDF

[62] Reasoning, Memorization, and Fine-Tuning Language Models for Non-Cooperative Games PDF

[63] AgentEvolver: Towards Efficient Self-Evolving Agent System PDF

[64] Table-critic: A multi-agent framework for collaborative criticism and refinement in table reasoning PDF

[65] GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based VLM Agent Training PDF

Memory-aware test-time scaling (MaTTS)

[51] Incremental model enhancement via memory-based contrastive learning PDF

[52] Demystifying Diffusion Policies: Action Memorization and Simple Lookup Table Alternatives PDF

[53] BECLR: Batch Enhanced Contrastive Few-Shot Learning PDF

[54] Style-Adaptive Detection Transformer for Single-Source Domain Generalized Object Detection PDF

[55] Multi-camera spatiotemporal deep learning framework for real-time abnormal behavior detection in dense urban environments PDF

[56] SoftCoT++: Test-Time Scaling with Soft Chain-of-Thought Reasoning PDF

[57] Contrastive Test-Time Adaptation PDF

Memory-driven experience as a new scaling dimension

[13] SAGE: Self-evolving Agents with Reflective and Memory-augmented Abilities PDF

[66] Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory PDF

[67] MetaAgent: Toward Self-Evolving Agent via Tool Meta-Learning PDF

[68] A survey of self-evolving agents: On path to artificial super intelligence PDF

[69] G-Memory: Tracing Hierarchical Memory for Multi-Agent Systems PDF

[70] Agentnet: Decentralized evolutionary coordination for llm-based multi-agent systems PDF

[71] Alita-g: Self-evolving generative agent for agent generation PDF

[72] Memory Sharing for Large Language Model based Agents PDF

[73] Chemagent: Self-updating memories in large language models improves chemical reasoning PDF

[74] VLM Can Be a Good Assistant: Enhancing Embodied Visual Tracking with Self-Improving Vision-Language Models PDF

Table of Contents