ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory
Overview
Overall Novelty Assessment
The paper proposes ReasoningBank, a memory framework enabling agents to self-curate generalizable reasoning strategies from both successful and failed experiences, and introduces memory-aware test-time scaling (MaTTS) to accelerate learning. Within the taxonomy, it resides in the 'Reflection and Verbal Reinforcement Learning' leaf under 'Memory-Based Learning Mechanisms', alongside three sibling papers. This leaf represents a moderately populated research direction focused on linguistic feedback and self-reflection without parametric updates, situated within a broader branch containing four distinct learning paradigms across the field's 50 papers.
The taxonomy reveals neighboring research directions that contextualize this work's positioning. Adjacent leaves include 'Self-Evolving and Lifelong Learning Agents' emphasizing continuous capability improvement, 'Experience Replay and Trajectory Synthesis' leveraging stored trajectories for sample efficiency, and 'Cross-Domain Experience Sharing' enabling knowledge transfer across tasks. The scope note for the paper's leaf explicitly excludes parametric weight updates, distinguishing reflection-based approaches from gradient-driven methods. This boundary clarifies that ReasoningBank operates through memory curation rather than model fine-tuning, connecting it to verbal reinforcement paradigms while diverging from replay-based learning mechanisms in neighboring leaves.
Among 27 candidates examined across three contributions, the ReasoningBank framework shows one refutable candidate out of 10 examined, suggesting some overlap with prior memory architectures. The MaTTS contribution examined 7 candidates with none refutable, indicating relatively sparser prior work on memory-guided test-time scaling. The third contribution on memory-driven experience as a scaling dimension examined 10 candidates with none refutable, suggesting this framing may be less explored. The limited search scope means these statistics reflect top semantic matches rather than exhaustive coverage, and the single refutable case for ReasoningBank indicates at least one prior work addresses similar memory curation concepts within the examined set.
Based on the top-27 semantic matches examined, the work appears to introduce novel combinations of memory curation with test-time scaling, though the ReasoningBank framework itself shows some overlap with existing memory architectures. The analysis covers semantically proximate papers but does not guarantee exhaustive field coverage, particularly for works using different terminology or published in specialized venues. The taxonomy positioning suggests the paper bridges reflection-based learning with scaling paradigms, occupying a moderately explored niche within the broader agent learning landscape.
Taxonomy
Research Landscape Overview
Claimed Contributions
A memory framework that distills high-level reasoning strategies from both successful and failed agent experiences into structured, reusable memory items (with title, description, and content), enabling agents to generalize across tasks and evolve over time rather than storing only raw trajectories or successful routines.
A test-time scaling approach that creates bidirectional synergy between memory and scaling: memory guides scaling toward more promising explorations, while diverse rollouts from scaling provide rich contrastive signals for higher-quality memory curation in both parallel and sequential settings.
The work establishes memory-driven experience as a novel dimension for test-time scaling in agent systems, demonstrating that agents can develop increasingly complex emergent reasoning strategies and self-evolving capabilities through the interaction of memory and scaling.
Core Task Comparisons
Comparisons with papers in the same taxonomy category
[1] Reflexion: language agents with verbal reinforcement learning PDF
[33] R2D2: Remembering, Replaying and Dynamic Decision Making with a Reflective Agentic Memory PDF
[34] MetaReflection: Learning Instructions for Language Agents using Past Reflections PDF
Contribution Analysis
Detailed comparisons for each claimed contribution
ReasoningBank memory framework
A memory framework that distills high-level reasoning strategies from both successful and failed agent experiences into structured, reusable memory items (with title, description, and content), enabling agents to generalize across tasks and evolve over time rather than storing only raw trajectories or successful routines.
[61] From Experience to Strategy: Empowering LLM Agents with Trainable Graph Memory PDF
[2] Xolver: Multi-Agent Reasoning with Holistic Experience Learning Just Like an Olympiad Team PDF
[18] Agent kb: Leveraging cross-domain experience for agentic problem solving PDF
[58] Curating Demonstrations using Online Experience PDF
[59] MAPLE: Multi-Agent Adaptive Planning with Long-Term Memory for Table Reasoning PDF
[60] SiriuS: Self-improving Multi-agent Systems via Bootstrapped Reasoning PDF
[62] Reasoning, Memorization, and Fine-Tuning Language Models for Non-Cooperative Games PDF
[63] AgentEvolver: Towards Efficient Self-Evolving Agent System PDF
[64] Table-critic: A multi-agent framework for collaborative criticism and refinement in table reasoning PDF
[65] GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based VLM Agent Training PDF
Memory-aware test-time scaling (MaTTS)
A test-time scaling approach that creates bidirectional synergy between memory and scaling: memory guides scaling toward more promising explorations, while diverse rollouts from scaling provide rich contrastive signals for higher-quality memory curation in both parallel and sequential settings.
[51] Incremental model enhancement via memory-based contrastive learning PDF
[52] Demystifying Diffusion Policies: Action Memorization and Simple Lookup Table Alternatives PDF
[53] BECLR: Batch Enhanced Contrastive Few-Shot Learning PDF
[54] Style-Adaptive Detection Transformer for Single-Source Domain Generalized Object Detection PDF
[55] Multi-camera spatiotemporal deep learning framework for real-time abnormal behavior detection in dense urban environments PDF
[56] SoftCoT++: Test-Time Scaling with Soft Chain-of-Thought Reasoning PDF
[57] Contrastive Test-Time Adaptation PDF
Memory-driven experience as a new scaling dimension
The work establishes memory-driven experience as a novel dimension for test-time scaling in agent systems, demonstrating that agents can develop increasingly complex emergent reasoning strategies and self-evolving capabilities through the interaction of memory and scaling.