MoM: Linear Sequence Modeling with Mixture-of-Memories
Overview
Overall Novelty Assessment
The paper introduces Mixture-of-Memories (MoM), an architecture that employs multiple independent memory states with a router network to direct tokens to specific states, thereby increasing overall memory capacity while reducing interference. Within the taxonomy, this work occupies a unique position: it is the sole paper in the 'Mixture-of-Memories and Multi-State Architectures' leaf, which itself is a distinct branch among ten major research directions. This isolation suggests the paper explores a relatively sparse research direction—multi-state memory systems with routing—compared to more crowded areas like selective state space models or memory-augmented transformers.
The taxonomy reveals that neighboring research directions include selective state space models (e.g., Mamba) that use content-based selection within a single state, gated linear attention mechanisms that incorporate slot-based or gating strategies, and memory-augmented transformers that rely on external memory modules. MoM diverges from these by distributing memory across multiple independent states rather than enhancing a single memory mechanism. The scope notes clarify that multi-scale state space models without routing belong elsewhere, emphasizing that MoM's routing-based multi-state approach is architecturally distinct from both single-state selective models and external memory augmentation strategies.
Among the 23 candidates examined via limited semantic search, none were found to clearly refute any of the three main contributions. For the core MoM architecture, 10 candidates were reviewed with zero refutable overlaps; for the general framework claim, 7 candidates yielded no refutations; and for the hardware-efficient implementation, 6 candidates showed no prior work that directly anticipates this approach. This suggests that within the examined scope, the multi-state routing concept and its integration with diverse memory update mechanisms appear relatively novel, though the search was not exhaustive and focused on top-K semantic matches.
Overall, the analysis indicates that MoM occupies a sparsely populated research niche within linear sequence modeling. The absence of sibling papers in its taxonomy leaf and the lack of refutable prior work among examined candidates suggest the approach is architecturally distinct from existing methods. However, this assessment is based on a limited literature search of 23 papers, and a broader survey might reveal related multi-state or routing-based memory systems not captured by the current semantic search scope.
Taxonomy
Research Landscape Overview
Claimed Contributions
The authors propose MoM, a new architecture that uses multiple independent memory states instead of a single fixed-size memory state. A router network selectively directs input tokens to specific memory states, which enhances overall memory capacity while minimizing memory interference in linear sequence models.
MoM is designed as a flexible framework that can integrate various memory update mechanisms from different linear sequence modeling methods, such as linear attention, state space models, and linear RNNs, making it broadly applicable across existing approaches.
The authors develop a hardware-efficient implementation that reorders tokens according to routing results and uses varlen (variable-length) operations with Triton kernels. This approach enables MoM to retain linear-time training complexity and constant-time inference complexity while efficiently processing multiple memory states.
Core Task Comparisons
Comparisons with papers in the same taxonomy category
Contribution Analysis
Detailed comparisons for each claimed contribution
Mixture-of-Memories (MoM) architecture
The authors propose MoM, a new architecture that uses multiple independent memory states instead of a single fixed-size memory state. A router network selectively directs input tokens to specific memory states, which enhances overall memory capacity while minimizing memory interference in linear sequence models.
[57] Multi-interest network with dynamic routing for recommendation at Tmall PDF
[58] Rcr-router: Efficient role-aware context routing for multi-agent llm systems with structured memory PDF
[59] Contextual self-referential memory trajectories for large language model consistency PDF
[60] Brain-like slot representation for sequence working memory in recurrent neural networks PDF
[61] Unified Spatio-Temporal Dynamic Routing for Efficient Video Object Segmentation PDF
[62] Sequential Recommendation with Decomposed Item Feature Routing PDF
[63] Hybrid Reasoning Network for Video-based Commonsense Captioning PDF
[64] Speech Emotion Recognition Using Sequential Capsule Networks PDF
[65] A dynamic routing CapsNet based on increment prototype clustering for overcoming catastrophic forgetting PDF
[66] Transformer-style relational reasoning with dynamic memory updating for temporal network modeling PDF
General framework compatible with diverse memory update mechanisms
MoM is designed as a flexible framework that can integrate various memory update mechanisms from different linear sequence modeling methods, such as linear attention, state space models, and linear RNNs, making it broadly applicable across existing approaches.
[1] Mamba: Linear-Time Sequence Modeling with Selective State Spaces PDF
[27] Gated slot attention for efficient linear-time sequence modeling PDF
[67] MambaEVT: Event Stream based Visual Object Tracking using State Space Model PDF
[68] Neuromorphic principles in self-attention hardware for efficient transformers PDF
[69] Mamba-ST: State Space Model for Efficient Style Transfer PDF
[70] Demystify Mamba in Vision: A Linear Attention Perspective PDF
[71] IN-CONTEXT LEARNING AS GENERAL-PURPOSE LEARNING: A COMPREHENSIVE SURVEY AND NEW PERSPECTIVES PDF
Hardware-efficient implementation using varlen operations
The authors develop a hardware-efficient implementation that reorders tokens according to routing results and uses varlen (variable-length) operations with Triton kernels. This approach enables MoM to retain linear-time training complexity and constant-time inference complexity while efficiently processing multiple memory states.