MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents

ICLR 2026 Conference SubmissionAnonymous Authors
deep researchreasoningcontext compression
Abstract:

Modern language agents often need to solve tasks requiring long-horizon, multi-turn interactions, where they retrieve external information, adapt to observations, and answer interdependent queries. Yet, most LLM systems rely on full-context prompting, appending all past turns regardless of their relevance. This leads to un-bounded memory growth, increased computational costs, and degraded reasoning performance on out-of-distribution input lengths due to LLM forgetting the context. We introduce MEM1, an end-to-end reinforcement learning framework that enables agents to operate with constant context size when solving long multi-turn tasks. At each turn, MEM1 updates a compact shared internal state that jointly supports memory consolidation and reasoning. Leveraging reinforcement learning (RL) and rollout trajectory truncation, we train a MEM1 agent to develop internal states that integrate prior memory with new observations from the environment while strategically discarding irrelevant or redundant information. Experiments across three domains, including internal retrieval QA, open-domain web QA, and multi-turn web shopping, show that MEM1-7B improves performance by 3.5×\times while reducing memory usage by 3.7×\times compared to Qwen2.5-14B-Instruct on an augmented multi-hop QA dataset with 16 objectives in each task, and generalizes beyond the training horizon. Our results demonstrate the promise of reasoning-driven memory consolidation as a scalable alternative to existing solutions for training long-horizon task-solving agents that involve multiple interactions, where both efficiency and performance are optimized.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces MEM1, a reinforcement learning framework that maintains constant context size for long-horizon multi-turn agents through memory consolidation and reasoning. It resides in the Memory Consolidation and Compression leaf, which contains four papers including Recursively Summarizing Dialogue, Pre-Storage Reasoning, and Compress to Impress. This leaf sits within Memory Operations and Dynamics, a moderately populated branch addressing update, retrieval, and compression mechanisms. The placement suggests the paper targets an active but not overcrowded research direction focused on distilling interaction histories into compact representations.

The taxonomy reveals neighboring leaves addressing complementary challenges: Memory Retrieval and Selection Strategies focuses on fetching relevant elements rather than compression, while Memory Update and Maintenance Policies handles dynamic content modification. Adjacent branches include Long-Horizon Task Execution and Planning, which examines multi-turn reasoning through reinforcement learning and multi-agent decomposition, and Conversational Interaction and Personalization, which emphasizes dialogue coherence and user modeling. MEM1 bridges consolidation techniques with RL-driven task execution, connecting memory compression goals to the broader challenge of sustained agent performance across extended interactions.

Among thirty candidates examined, none clearly refuted the three core contributions: the RL framework for memory-efficient agents, the unified consolidation-reasoning mechanism, and multi-objective task augmentation. Each contribution was assessed against ten candidates with zero refutable overlaps identified. This suggests the specific combination of RL-driven consolidation, constant-size context maintenance, and joint memory-reasoning updates may represent a relatively unexplored configuration within the limited search scope. However, the analysis does not claim exhaustive coverage; sibling papers like Recursively Summarizing Dialogue and Pre-Storage Reasoning address related compression challenges through different technical approaches.

Based on the top-thirty semantic matches and taxonomy structure, MEM1 appears to occupy a distinct position within memory consolidation research by integrating RL training with constant-context constraints. The absence of refutable candidates across contributions indicates potential novelty in the specific technical synthesis, though the limited search scope precludes definitive claims about the broader literature. The work's placement among four sibling papers in an active leaf suggests it contributes to an established research direction while potentially introducing new methodological angles.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
30
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: memory consolidation for long-horizon multi-turn language agents. The field addresses how agents maintain and refine memory across extended interactions, organizing research into several interconnected branches. Memory Architecture and Representation explores structural designs such as hierarchical stores and multi-granularity indexing (e.g., Multi-Granularity Memory Association[6], MemoryBank[11]), while Memory Operations and Dynamics focuses on the mechanisms by which agents update, compress, and retrieve information over time. Long-Horizon Task Execution and Planning examines how agents leverage memory to sustain coherent behavior across many turns (e.g., Reinforcement Long-Horizon Agents[1], Hiagent[4]), and Conversational Interaction and Personalization investigates memory's role in dialogue continuity and user modeling (e.g., Interpersonal Memory[12], Beyond Goldfish Memory[8]). Evaluation and Benchmarking provides testbeds like LongMemEval[13] and MemTrack[17], while Theoretical Foundations and Surveys offer broader perspectives (e.g., Rethinking Memory AI[19]), and Domain-Specific Applications tailor memory solutions to areas such as healthcare or virtual characters. Within Memory Operations and Dynamics, a particularly active line of work centers on memory consolidation and compression—techniques that distill lengthy interaction histories into compact, reusable representations. MEM1[0] sits squarely in this cluster, emphasizing methods to condense multi-turn exchanges without losing critical context. Nearby, Recursively Summarizing Dialogue[2] explores hierarchical summarization strategies, while Pre-Storage Reasoning[18] investigates reasoning steps before committing information to memory, and Compress to Impress[24] examines trade-offs between compression ratios and retrieval fidelity. In contrast, Reflective Memory Management[3] highlights meta-cognitive processes that decide what to retain or discard, adding a layer of strategic oversight. These works collectively grapple with balancing efficiency and accuracy: aggressive compression risks information loss, yet verbatim storage quickly becomes intractable. MEM1[0] contributes to this ongoing conversation by proposing consolidation mechanisms tailored for long-horizon scenarios, positioning itself among efforts that prioritize scalable, context-aware memory refinement.

Claimed Contributions

MEM1 reinforcement learning framework for memory-efficient long-horizon agents

The authors propose MEM1, a reinforcement learning framework that trains language agents to maintain nearly constant memory usage across long-horizon tasks by consolidating memory and reasoning into a shared internal state, discarding irrelevant information while retaining essential context.

10 retrieved papers
Unified memory consolidation and reasoning mechanism

The method integrates inference-time reasoning with memory consolidation in a single internal state representation, enabling the agent to both reason about current queries and extract essential information for future use without requiring separate memory modules.

10 retrieved papers
Multi-objective task augmentation for long-horizon training

The authors design a multi-objective QA task by interleaving multiple questions from existing datasets into composite queries, requiring agents to issue multiple searches and organize sub-answers coherently, thereby creating training environments that necessitate memory management over extended horizons.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

MEM1 reinforcement learning framework for memory-efficient long-horizon agents

The authors propose MEM1, a reinforcement learning framework that trains language agents to maintain nearly constant memory usage across long-horizon tasks by consolidating memory and reasoning into a shared internal state, discarding irrelevant information while retaining essential context.

Contribution

Unified memory consolidation and reasoning mechanism

The method integrates inference-time reasoning with memory consolidation in a single internal state representation, enabling the agent to both reason about current queries and extract essential information for future use without requiring separate memory modules.

Contribution

Multi-objective task augmentation for long-horizon training

The authors design a multi-objective QA task by interleaving multiple questions from existing datasets into composite queries, requiring agents to issue multiple searches and organize sub-answers coherently, thereby creating training environments that necessitate memory management over extended horizons.