MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.5 Download Report PDF

deep researchreasoningcontext compression

Modern language agents often need to solve tasks requiring long-horizon, multi-turn interactions, where they retrieve external information, adapt to observations, and answer interdependent queries. Yet, most LLM systems rely on full-context prompting, appending all past turns regardless of their relevance. This leads to un-bounded memory growth, increased computational costs, and degraded reasoning performance on out-of-distribution input lengths due to LLM forgetting the context. We introduce MEM1, an end-to-end reinforcement learning framework that enables agents to operate with constant context size when solving long multi-turn tasks. At each turn, MEM1 updates a compact shared internal state that jointly supports memory consolidation and reasoning. Leveraging reinforcement learning (RL) and rollout trajectory truncation, we train a MEM1 agent to develop internal states that integrate prior memory with new observations from the environment while strategically discarding irrelevant or redundant information. Experiments across three domains, including internal retrieval QA, open-domain web QA, and multi-turn web shopping, show that MEM1-7B improves performance by 3.5 $\times$ while reducing memory usage by 3.7 $\times$ compared to Qwen2.5-14B-Instruct on an augmented multi-hop QA dataset with 16 objectives in each task, and generalizes beyond the training horizon. Our results demonstrate the promise of reasoning-driven memory consolidation as a scalable alternative to existing solutions for training long-horizon task-solving agents that involve multiple interactions, where both efficiency and performance are optimized.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces MEM1, a reinforcement learning framework that maintains constant context size for long-horizon multi-turn agents through memory consolidation and reasoning. It resides in the Memory Consolidation and Compression leaf, which contains four papers including Recursively Summarizing Dialogue, Pre-Storage Reasoning, and Compress to Impress. This leaf sits within Memory Operations and Dynamics, a moderately populated branch addressing update, retrieval, and compression mechanisms. The placement suggests the paper targets an active but not overcrowded research direction focused on distilling interaction histories into compact representations.

The taxonomy reveals neighboring leaves addressing complementary challenges: Memory Retrieval and Selection Strategies focuses on fetching relevant elements rather than compression, while Memory Update and Maintenance Policies handles dynamic content modification. Adjacent branches include Long-Horizon Task Execution and Planning, which examines multi-turn reasoning through reinforcement learning and multi-agent decomposition, and Conversational Interaction and Personalization, which emphasizes dialogue coherence and user modeling. MEM1 bridges consolidation techniques with RL-driven task execution, connecting memory compression goals to the broader challenge of sustained agent performance across extended interactions.

Among thirty candidates examined, none clearly refuted the three core contributions: the RL framework for memory-efficient agents, the unified consolidation-reasoning mechanism, and multi-objective task augmentation. Each contribution was assessed against ten candidates with zero refutable overlaps identified. This suggests the specific combination of RL-driven consolidation, constant-size context maintenance, and joint memory-reasoning updates may represent a relatively unexplored configuration within the limited search scope. However, the analysis does not claim exhaustive coverage; sibling papers like Recursively Summarizing Dialogue and Pre-Storage Reasoning address related compression challenges through different technical approaches.

Based on the top-thirty semantic matches and taxonomy structure, MEM1 appears to occupy a distinct position within memory consolidation research by integrating RL training with constant-context constraints. The absence of refutable candidates across contributions indicates potential novelty in the specific technical synthesis, though the limited search scope precludes definitive claims about the broader literature. The work's placement among four sibling papers in an active leaf suggests it contributes to an established research direction while potentially introducing new methodological angles.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: memory consolidation for long-horizon multi-turn language agents. The field addresses how agents maintain and refine memory across extended interactions, organizing research into several interconnected branches. Memory Architecture and Representation explores structural designs such as hierarchical stores and multi-granularity indexing (e.g., Multi-Granularity Memory Association[6], MemoryBank[11]), while Memory Operations and Dynamics focuses on the mechanisms by which agents update, compress, and retrieve information over time. Long-Horizon Task Execution and Planning examines how agents leverage memory to sustain coherent behavior across many turns (e.g., Reinforcement Long-Horizon Agents[1], Hiagent[4]), and Conversational Interaction and Personalization investigates memory's role in dialogue continuity and user modeling (e.g., Interpersonal Memory[12], Beyond Goldfish Memory[8]). Evaluation and Benchmarking provides testbeds like LongMemEval[13] and MemTrack[17], while Theoretical Foundations and Surveys offer broader perspectives (e.g., Rethinking Memory AI[19]), and Domain-Specific Applications tailor memory solutions to areas such as healthcare or virtual characters. Within Memory Operations and Dynamics, a particularly active line of work centers on memory consolidation and compression—techniques that distill lengthy interaction histories into compact, reusable representations. MEM1[0] sits squarely in this cluster, emphasizing methods to condense multi-turn exchanges without losing critical context. Nearby, Recursively Summarizing Dialogue[2] explores hierarchical summarization strategies, while Pre-Storage Reasoning[18] investigates reasoning steps before committing information to memory, and Compress to Impress[24] examines trade-offs between compression ratios and retrieval fidelity. In contrast, Reflective Memory Management[3] highlights meta-cognitive processes that decide what to retain or discard, adding a layer of strategic oversight. These works collectively grapple with balancing efficiency and accuracy: aggressive compression risks information loss, yet verbatim storage quickly becomes intractable. MEM1[0] contributes to this ongoing conversation by proposing consolidation mechanisms tailored for long-horizon scenarios, positioning itself among efforts that prioritize scalable, context-aware memory refinement.

Claimed Contributions

MEM1 reinforcement learning framework for memory-efficient long-horizon agents

10 retrieved papers

The authors propose MEM1, a reinforcement learning framework that trains language agents to maintain nearly constant memory usage across long-horizon tasks by consolidating memory and reasoning into a shared internal state, discarding irrelevant information while retaining essential context.

10 retrieved papers

Unified memory consolidation and reasoning mechanism

10 retrieved papers

The method integrates inference-time reasoning with memory consolidation in a single internal state representation, enabling the agent to both reason about current queries and extract essential information for future use without requiring separate memory modules.

10 retrieved papers

Multi-objective task augmentation for long-horizon training

10 retrieved papers

The authors design a multi-objective QA task by interleaving multiple questions from existing datasets into composite queries, requiring agents to issue multiple searches and organize sub-answers coherently, thereby creating training environments that necessitate memory management over extended horizons.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[2] Recursively summarizing enables long-term dialogue memory in large language models PDF

Qingyue Wang, Yanhe Fu, Yanan Cao, Shuai Wang, Zhiliang Tian, Liang, Ding, Liang Ding (2025)

[18] Pre-storage reasoning for episodic memory: Shifting inference burden to memory for personalized dialogue PDF

Sangyeop Kim, Yohan Lee, Sang-Hwa Kim, Hyun-Jong Kim, Sungzoon Cho (2025)

[24] Compress to impress: Unleashing the potential of compressive memory in real-world long-term conversations PDF

Chen, Nuo, Nuo Chen, Li Hong-Guang <, Hongguang Li, Huang Juhua, Juhua Huang, Wang, Baoyuan, Baoyuan Wang, Li Jia, Jia Li (2025)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

MEM1 reinforcement learning framework for memory-efficient long-horizon agents

[1] Reinforcement learning for long-horizon interactive llm agents PDF

Cannot Refute

[71] A multi-agent deep reinforcement learning approach for optimal resource management in serverless computing PDF

Cannot Refute

[72] Amago: Scalable in-context reinforcement learning for adaptive agents PDF

Cannot Refute

[73] Group-in-group policy optimization for llm agent training PDF

Cannot Refute

[74] Deep reinforcement learning for energy and time optimized scheduling of precedence-constrained tasks in edgeâcloud computing environments PDF

Cannot Refute

[75] VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding PDF

Cannot Refute

[76] AppCopilot: Toward General, Accurate, Long-Horizon, and Efficient Mobile Agent PDF

Cannot Refute

[77] Distributed Deep Multi-Agent Reinforcement Learning for Cooperative Edge Caching in Internet-of-Vehicles PDF

Cannot Refute

[78] Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks PDF

Cannot Refute

[79] Federated Deep Reinforcement Learning for Recommendation-Enabled Edge Caching in Mobile Edge-Cloud Computing Networks PDF

Cannot Refute

Contribution

Unified memory consolidation and reasoning mechanism

[61] Llm in a flash: Efficient large language model inference with limited memory PDF

Cannot Refute

[62] Cognitive architectures for language agents PDF

Cannot Refute

[63] Towards General Continuous Memory for Vision-Language Models PDF

Cannot Refute

[64] MemVerse: Multimodal Memory for Lifelong Learning Agents PDF

Cannot Refute

[65] Reflexion: language agents with verbal reinforcement learning PDF

Cannot Refute

[66] Mirix: Multi-agent memory system for llm-based agents PDF

Cannot Refute

[67] Move Less, Retrieve Fast: A Retrieval-in-Memory Architecture for Language Models PDF

Cannot Refute

[68] Memory is all you need: An overview of compute-in-memory architectures for accelerating large language model inference PDF

Cannot Refute

[69] Reasoningbank: Scaling agent self-evolving with reasoning memory PDF

Cannot Refute

[70] A novel model of narrative memory for conscious agents PDF

Cannot Refute

Contribution

Multi-objective task augmentation for long-horizon training

[51] STEP: Enhancing Video-LLMsâ Compositional Reasoning by Spatio-Temporal Graph-guided Self-Training PDF

Cannot Refute

[52] Graph-grounded goal planning for conversational recommendation PDF

Cannot Refute

[53] Explainable Multi-hop Verbal Reasoning Through Internal Monologue PDF

Cannot Refute

[54] Thinking Forward and Backward: Multi-Objective Reinforcement Learning for Retrieval-Augmented Reasoning PDF

Cannot Refute

[55] ReAgent: Reversible Multi-Agent Reasoning for Knowledge-Enhanced Multi-Hop QA PDF

Cannot Refute

[56] Chain-of-thought Reviewing and Correction for Time Series Question Answering PDF

Cannot Refute

[57] Self-Adaptive Reasoning on Sub-Questions for Multi-Hop Question Answering PDF

Cannot Refute

[58] Lost in the Middle, and In-Between: Enhancing Language Models' Ability to Reason Over Long Contexts in Multi-Hop QA PDF

Cannot Refute

[59] VTQA2023: ACM Multimedia 2023 Visual Text Question Answering Challenge PDF

Cannot Refute

[60] Single Sequence Prediction over Reasoning Graphs for Multi-hop QA PDF

Cannot Refute

MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[2] Recursively summarizing enables long-term dialogue memory in large language models PDF

[18] Pre-storage reasoning for episodic memory: Shifting inference burden to memory for personalized dialogue PDF

[24] Compress to impress: Unleashing the potential of compressive memory in real-world long-term conversations PDF

Contribution Analysis

MEM1 reinforcement learning framework for memory-efficient long-horizon agents

[1] Reinforcement learning for long-horizon interactive llm agents PDF

[71] A multi-agent deep reinforcement learning approach for optimal resource management in serverless computing PDF

[72] Amago: Scalable in-context reinforcement learning for adaptive agents PDF

[73] Group-in-group policy optimization for llm agent training PDF

[74] Deep reinforcement learning for energy and time optimized scheduling of precedence-constrained tasks in edgeâcloud computing environments PDF

[75] VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding PDF

[76] AppCopilot: Toward General, Accurate, Long-Horizon, and Efficient Mobile Agent PDF

[77] Distributed Deep Multi-Agent Reinforcement Learning for Cooperative Edge Caching in Internet-of-Vehicles PDF

[78] Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks PDF

[79] Federated Deep Reinforcement Learning for Recommendation-Enabled Edge Caching in Mobile Edge-Cloud Computing Networks PDF

Unified memory consolidation and reasoning mechanism

[61] Llm in a flash: Efficient large language model inference with limited memory PDF

[62] Cognitive architectures for language agents PDF

[63] Towards General Continuous Memory for Vision-Language Models PDF

[64] MemVerse: Multimodal Memory for Lifelong Learning Agents PDF

[65] Reflexion: language agents with verbal reinforcement learning PDF

[66] Mirix: Multi-agent memory system for llm-based agents PDF

[67] Move Less, Retrieve Fast: A Retrieval-in-Memory Architecture for Language Models PDF

[68] Memory is all you need: An overview of compute-in-memory architectures for accelerating large language model inference PDF

[69] Reasoningbank: Scaling agent self-evolving with reasoning memory PDF

[70] A novel model of narrative memory for conscious agents PDF

Multi-objective task augmentation for long-horizon training

[51] STEP: Enhancing Video-LLMsâ Compositional Reasoning by Spatio-Temporal Graph-guided Self-Training PDF

[52] Graph-grounded goal planning for conversational recommendation PDF

[53] Explainable Multi-hop Verbal Reasoning Through Internal Monologue PDF

[54] Thinking Forward and Backward: Multi-Objective Reinforcement Learning for Retrieval-Augmented Reasoning PDF

[55] ReAgent: Reversible Multi-Agent Reasoning for Knowledge-Enhanced Multi-Hop QA PDF

[56] Chain-of-thought Reviewing and Correction for Time Series Question Answering PDF

[57] Self-Adaptive Reasoning on Sub-Questions for Multi-Hop Question Answering PDF

[58] Lost in the Middle, and In-Between: Enhancing Language Models' Ability to Reason Over Long Contexts in Multi-Hop QA PDF

[59] VTQA2023: ACM Multimedia 2023 Visual Text Question Answering Challenge PDF

[60] Single Sequence Prediction over Reasoning Graphs for Multi-hop QA PDF

Table of Contents

[74] Deep reinforcement learning for energy and time optimized scheduling of precedence-constrained tasks in edgeâcloud computing environments PDF

[51] STEP: Enhancing Video-LLMsâ Compositional Reasoning by Spatio-Temporal Graph-guided Self-Training PDF