Abstract:

Despite improvements by length extrapolation, efficient attention and memory modules, handling infinitely long documents without performance degradation during extrapolation remains the ultimate challenge in long-text processing. To solve this problem, We introduce a novel agent workflow, \method, which processes text in segments and updates memory through an overwrite strategy, addressing the challenge of long-context task through enhanced memory management. We further extend the DAPO algorithm to directly optimize memory ability in an end-to-end fashion, facilitating training via independent-context multi-conversation generation. Experimental results demonstrate that MemAgent has superb long-context capabilities, being able to extrapolate from an 8K context to a 3.5M QA task with a performance loss of less than 10% and achieving over 95% on the 512K NIAH test.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
29
Contribution Candidate Papers Compared
1
Refutable Paper

Research Landscape Overview

Core task: Long-context language model processing with memory management. The field addresses how language models can efficiently handle extended sequences that exceed typical context windows, organizing solutions into several major branches. Memory Architecture and Representation explores structured storage mechanisms such as episodic buffers and hierarchical memory systems (e.g., MemoryBank[6], Cognitive Memory[12]). Attention Mechanisms and Architectural Alternatives investigates alternatives to standard attention, including recurrent designs (Retentive Network[5], RecurrentGemma[27]) and hybrid approaches (Samba[24]). KV Cache Management and Optimization focuses on efficient key-value storage strategies (PagedAttention[13], Attention Sinks[9]), while Context Compression and Encoding develops methods to condense long inputs (Context Autoencoder[16]). Retrieval-Augmentation and External Knowledge integrates retrieval systems (HippoRAG[1], Retrieval Long Context[2]), and Training and Data Strategies for Long Context examines how models learn to process extended sequences (LongRecipe[4]). Dynamic Chunking and Segmentation, Query-Aware and Selective Processing, and Agent-Based Long-Context Processing address adaptive strategies for managing context, with the latter branch emphasizing reinforcement learning and agentic workflows. A particularly active line of work contrasts architectural redesigns—such as recurrent or state-space models that avoid quadratic attention costs—with retrieval-augmented approaches that selectively fetch relevant context. Another tension lies between static compression techniques and dynamic, query-aware selection methods (InfLLM[18], Dynamic Chunking Selection[17]). Within the Agent-Based Long-Context Processing branch, MemAgent[0] employs reinforcement learning to train memory-management policies, positioning itself among works that treat memory as a learnable decision-making process rather than a fixed architectural component. This approach contrasts with retrieval-focused methods like HippoRAG[1], which rely on predefined indexing schemes, and with hierarchical agent frameworks such as HiAgent[28], which decompose tasks across multiple agents. By framing memory operations as RL-optimized actions, MemAgent[0] explores how adaptive policies can balance retention and eviction trade-offs in extended interactions.

Claimed Contributions

MEMAGENT agent workflow with overwrite-based memory management

The authors propose MEMAGENT, a new agent workflow that handles long-context tasks by dividing documents into segments and iteratively updating a fixed-length memory using an overwrite strategy. This approach enables processing of arbitrarily long texts with linear time complexity while maintaining performance.

9 retrieved papers
Multi-Conv DAPO algorithm for end-to-end memory optimization

The authors extend the DAPO reinforcement learning algorithm to create Multi-Conv DAPO, which optimizes memory capabilities end-to-end by treating each context-independent conversation as an optimization objective. This enables training of agent workflows with multiple rounds of memory updates across independent contexts.

10 retrieved papers
RL-based approach for dynamically updated fixed-length memory in LLMs

The authors introduce a reinforcement learning method that enables LLMs to maintain and update a fixed-length memory dynamically as they process text segment-by-segment. This allows the model to handle arbitrary text lengths while maintaining linear time complexity during processing.

10 retrieved papers
Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

MEMAGENT agent workflow with overwrite-based memory management

The authors propose MEMAGENT, a new agent workflow that handles long-context tasks by dividing documents into segments and iteratively updating a fixed-length memory using an overwrite strategy. This approach enables processing of arbitrarily long texts with linear time complexity while maintaining performance.

Contribution

Multi-Conv DAPO algorithm for end-to-end memory optimization

The authors extend the DAPO reinforcement learning algorithm to create Multi-Conv DAPO, which optimizes memory capabilities end-to-end by treating each context-independent conversation as an optimization objective. This enables training of agent workflows with multiple rounds of memory updates across independent contexts.

Contribution

RL-based approach for dynamically updated fixed-length memory in LLMs

The authors introduce a reinforcement learning method that enables LLMs to maintain and update a fixed-length memory dynamically as they process text segment-by-segment. This allows the model to handle arbitrary text lengths while maintaining linear time complexity during processing.