MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.5 Download Report PDF

LLMmemoryagentRLVR

Despite improvements by length extrapolation, efficient attention and memory modules, handling infinitely long documents without performance degradation during extrapolation remains the ultimate challenge in long-text processing. To solve this problem, We introduce a novel agent workflow, \method, which processes text in segments and updates memory through an overwrite strategy, addressing the challenge of long-context task through enhanced memory management. We further extend the DAPO algorithm to directly optimize memory ability in an end-to-end fashion, facilitating training via independent-context multi-conversation generation. Experimental results demonstrate that MemAgent has superb long-context capabilities, being able to extrapolate from an 8K context to a 3.5M QA task with a performance loss of less than 10% and achieving over 95% on the 512K NIAH test.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Long-context language model processing with memory management. The field addresses how language models can efficiently handle extended sequences that exceed typical context windows, organizing solutions into several major branches. Memory Architecture and Representation explores structured storage mechanisms such as episodic buffers and hierarchical memory systems (e.g., MemoryBank[6], Cognitive Memory[12]). Attention Mechanisms and Architectural Alternatives investigates alternatives to standard attention, including recurrent designs (Retentive Network[5], RecurrentGemma[27]) and hybrid approaches (Samba[24]). KV Cache Management and Optimization focuses on efficient key-value storage strategies (PagedAttention[13], Attention Sinks[9]), while Context Compression and Encoding develops methods to condense long inputs (Context Autoencoder[16]). Retrieval-Augmentation and External Knowledge integrates retrieval systems (HippoRAG[1], Retrieval Long Context[2]), and Training and Data Strategies for Long Context examines how models learn to process extended sequences (LongRecipe[4]). Dynamic Chunking and Segmentation, Query-Aware and Selective Processing, and Agent-Based Long-Context Processing address adaptive strategies for managing context, with the latter branch emphasizing reinforcement learning and agentic workflows. A particularly active line of work contrasts architectural redesigns—such as recurrent or state-space models that avoid quadratic attention costs—with retrieval-augmented approaches that selectively fetch relevant context. Another tension lies between static compression techniques and dynamic, query-aware selection methods (InfLLM[18], Dynamic Chunking Selection[17]). Within the Agent-Based Long-Context Processing branch, MemAgent[0] employs reinforcement learning to train memory-management policies, positioning itself among works that treat memory as a learnable decision-making process rather than a fixed architectural component. This approach contrasts with retrieval-focused methods like HippoRAG[1], which rely on predefined indexing schemes, and with hierarchical agent frameworks such as HiAgent[28], which decompose tasks across multiple agents. By framing memory operations as RL-optimized actions, MemAgent[0] explores how adaptive policies can balance retention and eviction trade-offs in extended interactions.

Claimed Contributions

MEMAGENT agent workflow with overwrite-based memory management

9 retrieved papers

The authors propose MEMAGENT, a new agent workflow that handles long-context tasks by dividing documents into segments and iteratively updating a fixed-length memory using an overwrite strategy. This approach enables processing of arbitrarily long texts with linear time complexity while maintaining performance.

9 retrieved papers

Multi-Conv DAPO algorithm for end-to-end memory optimization

10 retrieved papers

The authors extend the DAPO reinforcement learning algorithm to create Multi-Conv DAPO, which optimizes memory capabilities end-to-end by treating each context-independent conversation as an optimization objective. This enables training of agent workflows with multiple rounds of memory updates across independent contexts.

10 retrieved papers

RL-based approach for dynamically updated fixed-length memory in LLMs

Can Refute

10 retrieved papers

The authors introduce a reinforcement learning method that enables LLMs to maintain and update a fixed-length memory dynamically as they process text segment-by-segment. This allows the model to handle arbitrary text lengths while maintaining linear time complexity during processing.

10 retrieved papers

Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

MEMAGENT agent workflow with overwrite-based memory management

[28] Hiagent: Hierarchical working memory management for solving long-horizon agent tasks with large language model PDF

Cannot Refute

[32] Long context scaling: Divide and conquer via multi-agent question-driven collaboration PDF

Cannot Refute

[51] Evaluating memory in llm agents via incremental multi-turn interactions PDF

Cannot Refute

[52] State and Memory is All You Need for Robust and Reliable AI Agents PDF

Cannot Refute

[53] Agentic Troubleshooting Guide Automation for Incident Management PDF

Cannot Refute

[54] Memory-Augmented Agent Training for Business Document Understanding PDF

Cannot Refute

[55] Dialog Generation Using Multi-Turn Reasoning Neural Networks PDF

Cannot Refute

[56] DelTA: An Online Document-Level Translation Agent Based on Multi-Level Memory PDF

Cannot Refute

[57] Step up your game: A research on two/multi-step summarisation of long, regulatory documents PDF

Cannot Refute

Contribution

Multi-Conv DAPO algorithm for end-to-end memory optimization

[58] Secom: On memory construction and retrieval for personalized conversational agents PDF

Cannot Refute

[59] Doctoragent-rl: A multi-agent collaborative reinforcement learning system for multi-turn clinical dialogue PDF

Cannot Refute

[60] Experience replay-based deep reinforcement learning for dialogue management optimisation PDF

Cannot Refute

[61] High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning PDF

Cannot Refute

[62] In prospect and retrospect: Reflective memory management for long-term personalized dialogue agents PDF

Cannot Refute

[63] Planning without Search: Refining Frontier LLMs with Offline Goal-Conditioned RL PDF

Cannot Refute

[64] Context-lite multi-turn reinforcement learning for LLM agents PDF

Cannot Refute

[65] RAIDEN-R1: Improving Role-awareness of LLMs via GRPO with Verifiable Reward PDF

Cannot Refute

[66] History-Aware Cross-Attention Reinforcement: Self-Supervised Multi Turn and Chain-of-Thought Fine-Tuning with vLLM PDF

Cannot Refute

[67] Memory-T1: Reinforcement Learning for Temporal Reasoning in Multi-session Agents PDF

Cannot Refute

Contribution

RL-based approach for dynamically updated fixed-length memory in LLMs

[68] Look back to reason forward: Revisitable memory for long-context llm agents PDF

Can Refute

[20] Enhancing large language models through dynamic contextual memory embedding: A technical evaluation PDF

Cannot Refute

[69] Monte Carlo Planning with Large Language Model for Text-Based Game Agents PDF

Cannot Refute

[70] A Category-Theoretic Framework for Wake-Sleep Consolidation in Dual-Transformer Architectures PDF

Cannot Refute

[71] Large language models are semi-parametric reinforcement learning agents PDF

Cannot Refute

[72] CtrlDiff: Boosting Large Diffusion Language Models with Dynamic Block Prediction and Controllable Generation PDF

Cannot Refute

[73] Semantic coherence dynamics in large language models through layered syntax-aware memory retention mechanism PDF

Cannot Refute

[74] On-Device Large Language Models: A Survey of Model Compression and System Optimization PDF

Cannot Refute

[75] Amago-2: Breaking the multi-task barrier in meta-reinforcement learning with transformers PDF

Cannot Refute

[76] Cognitive Architectures for Tomorrow: A Comprehensive Survey of Memory Management Paradigms in Agentic AI Systems PDF

Cannot Refute

MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent

Overview

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

Contribution Analysis

MEMAGENT agent workflow with overwrite-based memory management

[28] Hiagent: Hierarchical working memory management for solving long-horizon agent tasks with large language model PDF

[32] Long context scaling: Divide and conquer via multi-agent question-driven collaboration PDF

[51] Evaluating memory in llm agents via incremental multi-turn interactions PDF

[52] State and Memory is All You Need for Robust and Reliable AI Agents PDF

[53] Agentic Troubleshooting Guide Automation for Incident Management PDF

[54] Memory-Augmented Agent Training for Business Document Understanding PDF

[55] Dialog Generation Using Multi-Turn Reasoning Neural Networks PDF

[56] DelTA: An Online Document-Level Translation Agent Based on Multi-Level Memory PDF

[57] Step up your game: A research on two/multi-step summarisation of long, regulatory documents PDF

Multi-Conv DAPO algorithm for end-to-end memory optimization

[58] Secom: On memory construction and retrieval for personalized conversational agents PDF

[59] Doctoragent-rl: A multi-agent collaborative reinforcement learning system for multi-turn clinical dialogue PDF

[60] Experience replay-based deep reinforcement learning for dialogue management optimisation PDF

[61] High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning PDF

[62] In prospect and retrospect: Reflective memory management for long-term personalized dialogue agents PDF

[63] Planning without Search: Refining Frontier LLMs with Offline Goal-Conditioned RL PDF

[64] Context-lite multi-turn reinforcement learning for LLM agents PDF

[65] RAIDEN-R1: Improving Role-awareness of LLMs via GRPO with Verifiable Reward PDF

[66] History-Aware Cross-Attention Reinforcement: Self-Supervised Multi Turn and Chain-of-Thought Fine-Tuning with vLLM PDF

[67] Memory-T1: Reinforcement Learning for Temporal Reasoning in Multi-session Agents PDF

RL-based approach for dynamically updated fixed-length memory in LLMs

[68] Look back to reason forward: Revisitable memory for long-context llm agents PDF

[20] Enhancing large language models through dynamic contextual memory embedding: A technical evaluation PDF

[69] Monte Carlo Planning with Large Language Model for Text-Based Game Agents PDF

[70] A Category-Theoretic Framework for Wake-Sleep Consolidation in Dual-Transformer Architectures PDF

[71] Large language models are semi-parametric reinforcement learning agents PDF

[72] CtrlDiff: Boosting Large Diffusion Language Models with Dynamic Block Prediction and Controllable Generation PDF

[73] Semantic coherence dynamics in large language models through layered syntax-aware memory retention mechanism PDF

[74] On-Device Large Language Models: A Survey of Model Compression and System Optimization PDF

[75] Amago-2: Breaking the multi-task barrier in meta-reinforcement learning with transformers PDF

[76] Cognitive Architectures for Tomorrow: A Comprehensive Survey of Memory Management Paradigms in Agentic AI Systems PDF

Table of Contents