Mamba-3: Improved Sequence Modeling using State Space Principles

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 7.0 Download Report PDF

State Space ModelsMambaLLMsSubquadratic Models

The recent scaling of test-time compute for LLMs has restricted the practical deployment of models to those with strong capabilities that can generate high-quality outputs in an inference-efficient manner. While current Transformer-based models are the standard, their quadratic compute and linear memory bottlenecks have spurred the development of sub-quadratic models with linear-scaling compute with constant memory requirements. However, many recent linear-style models lack certain capabilities or lag behind in quality, and even their linear-time inference is not hardware-efficient. Guided by an inference-first perspective, we introduce three core methodological improvements inspired by the state-space model viewpoint of linear models. We combine a: 1) more expressive recurrence, 2) complex state update rule that enables richer state tracking, and 3) multi-input, multi-output formulation together, resulting in a stronger model that better exploits hardware parallelism during decoding. Together with architectural refinements, our Mamba-3 model achieves significant gains across retrieval, state-tracking, and downstream language modeling tasks. Our new architecture sets the Pareto-frontier for performance under a fixed inference budget and outperforms strong baselines in a head-to-head comparison.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces Mamba-3, a state space model architecture that combines trapezoidal discretization, complex-valued state updates with data-dependent rotary position embeddings, and a multi-input multi-output formulation to improve inference efficiency. It resides in the State Space Model Foundations leaf, which contains four papers including foundational work like Mamba and Structured Linear CDEs. This leaf represents a moderately populated research direction within the broader Linear-Time Sequence Model Architectures branch, indicating active but not overcrowded exploration of core SSM design principles.

The taxonomy reveals that State Space Model Foundations sits alongside Linear Attention Mechanisms (three papers), Recurrent and Convolutional Sequence Models (four papers), and Hybrid and Multi-Modal Architectures (four papers). These neighboring leaves explore alternative paths to linear complexity: attention approximations, gated recurrence, and architectural fusion. Mamba-3's focus on enriching the SSM recurrence and state update rules positions it as an evolution within the SSM paradigm rather than a hybrid approach, distinguishing it from multi-modal extensions or attention-based alternatives in sibling categories.

Among 30 candidates examined, the trapezoidal discretization contribution shows no clear refutation across 10 candidates, suggesting relative novelty in this specific discretization scheme. The complex-valued state update rule encountered one refutable candidate among 10 examined, indicating some prior exploration of complex state mechanisms. The MIMO formulation found two refutable candidates among 10, suggesting more substantial prior work on multi-channel or parallel processing strategies. These statistics reflect a limited semantic search scope, not exhaustive coverage, and indicate that the discretization method appears least explored while the MIMO approach has more documented precedents.

Based on the top-30 semantic matches and taxonomy structure, the work appears to advance an active but not saturated research direction. The contribution-level analysis suggests incremental refinement of existing SSM concepts rather than entirely novel primitives, though the specific combination and hardware-oriented design may offer practical value. The limited search scope means potentially relevant work outside the top-30 candidates or in adjacent subfields may not be captured in this assessment.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: efficient inference for linear-time sequence models. The field encompasses architectures and techniques that process sequences in linear time, avoiding the quadratic complexity of standard attention mechanisms. The taxonomy reveals several major branches: foundational architectures (including state space models and their variants), optimization techniques for faster inference (such as hardware-aware kernels and memory-efficient implementations), training and compression methods that reduce model size or computational overhead, domain-specific applications spanning speech, vision, and time-series forecasting, error correction and decoding theory from coding and communication systems, theoretical analyses of convergence and expressiveness, and survey literature synthesizing recent progress. Representative works like Mamba[49] and xLSTM 7B[10] illustrate how state space model foundations enable scalable sequence modeling, while efforts such as Flash Inference[12] and Hardware-efficient Attention[4] demonstrate inference optimization in practice. Meanwhile, compression approaches like In-Training Compression SSMs[43] and modular designs such as Sparse Modular Activation[2] address efficiency from complementary angles. A particularly active line of work centers on state space model architectures that balance expressiveness with computational efficiency, contrasting with traditional recurrent and attention-based methods. Mamba-3[0] sits within this dense branch of state space model foundations, building on the selective state space framework introduced by Mamba[49] and exploring structured parameterizations akin to Structured Linear CDEs[42]. Compared to nearby efforts like In-Training Compression SSMs[43], which emphasizes reducing memory footprint during training, Mamba-3[0] focuses more directly on architectural innovations that preserve linear-time inference guarantees while enhancing modeling capacity. This positioning reflects a broader tension in the field: whether to prioritize architectural expressiveness, aggressive compression, or hardware-specific optimizations. Open questions remain around the trade-offs between these dimensions, especially as models scale and as domain-specific constraints—from real-time speech decoding to long-context document understanding—demand tailored solutions.

Claimed Contributions

Trapezoidal discretization for state-space models

10 retrieved papers

The authors introduce a generalized trapezoidal discretization method for state-space models that provides a second-order accurate approximation, yielding a more expressive recurrence than Mamba-2's Euler-based approach. This discretization can be viewed as applying a data-dependent convolution and, combined with applied biases on B and C, empirically eliminates the need for short causal convolution.

10 retrieved papers

Complex-valued state update rule with data-dependent RoPE

Can Refute

10 retrieved papers

The authors propose using complex-valued state-space models that enable rotational hidden state dynamics, addressing state-tracking limitations in prior linear models. They show this is equivalent to applying data-dependent rotary embeddings (RoPE) on input and output projections, enabling efficient implementation while recovering capabilities like parity and modular arithmetic that Mamba-2 cannot solve.

10 retrieved papers

Can Refute

Multi-input multi-output (MIMO) formulation for improved hardware utilization

Can Refute

10 retrieved papers

The authors introduce a MIMO variant that shifts from outer-product-based to matrix-multiplication-based state updates, increasing arithmetic intensity and improving hardware utilization during decoding. This formulation allows more compute during state update without increasing state size, pushing the Pareto frontier of inference efficiency while maintaining or improving model quality.

10 retrieved papers

Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[42] Structured Linear CDEs: Maximally Expressive and Parallel-in-Time Sequence Models PDF

Walker, Benjamin, Yang Ling-yi, Cirone, Nicola Muca, Salvi, Cristopher, Lyons, Terry (2025) • arXiv (Cornell University)

[43] The Curious Case of In-Training Compression of State Space Models PDF

Chahine, Makram, Nazari, Philipp, Rus, Daniela, Rusch, T. Konstantin (2025) • arXiv (Cornell University)

[49] Mamba: Linear-Time Sequence Modeling with Selective State Spaces PDF

Gu, Albert, Dao, Tri (2023) • arXiv (Cornell University)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Trapezoidal discretization for state-space models

[61] Fast Solvers for Discrete Diffusion Models: Theory and Applications of High-Order Algorithms PDF

Cannot Refute

[62] Detail Matters: Mamba-Inspired Joint Unfolding Network for Snapshot Spectral Compressive Imaging PDF

Cannot Refute

[63] A Damping-Free Method for Mitigation of Trapezoidal Rule Oscillations in Linear Systems PDF

Cannot Refute

[64] Supplement to 'The discretization filter: A simple way to estimate nonlinear state space models' PDF

Cannot Refute

[65] Comparative Analysis of State-Space and Companion-Circuit Methodologies for the Periodic Steady-State Solution in Time-Domain of Nonlinear Electric Networks PDF

Cannot Refute

[66] Fixed-rate modeling of audio lumped systems: A comparison between trapezoidal and implicit midpoint methods PDF

Cannot Refute

[67] A fast second-order accurate difference schemes for time distributed-order and Riesz space fractional diffusion equations PDF

Cannot Refute

[68] Modelling of nonlinear state-space systems using a deep neural network PDF

Cannot Refute

[69] A state-space-based implicit integration algorithm for differential-algebraic equations of multibody dynamics PDF

Cannot Refute

[70] Collision Avoidance using Iterative Dynamic and Nonlinear Programming with Adaptive Grid Refinements PDF

Cannot Refute

Contribution

Complex-valued state update rule with data-dependent RoPE

[57] Wonderful Matrices: Combining for a More Efficient and Effective Foundation Model Architecture PDF

Can Refute

[51] VectorMamba: Enhancing point cloud analysis through vector representations and state space modeling PDF

Cannot Refute

[52] TransXSSM: A Hybrid Transformer State Space Model with Unified Rotary Position Embedding PDF

Cannot Refute

[53] State Space Models Naturally Produce Traveling Waves, Time Cells, and Scale to Abstract Cognitive Functions PDF

Cannot Refute

[54] Edge-Deployed Band-Split Rotary Position Encoding Transformer for Ultra-Low-Signal-to-Noise-Ratio Unmanned Aerial Vehicle Speech Enhancement PDF

Cannot Refute

[55] Incorporating sequential and geometric structure into deep neural networks PDF

Cannot Refute

[56] HoPE: Hyperbolic Rotary Positional Encoding for Stable Long-Range Dependency Modeling in Large Language Models PDF

Cannot Refute

[58] Equivariant Learning in Spatial Action Spaces PDF

Cannot Refute

[59] RotateCT: Knowledge Graph Embedding by Rotation and Coordinate Transformation in Complex Space PDF

Cannot Refute

[60] RRG-Mamba: Efficient Radiology Report Generation with State Space Model PDF

Cannot Refute

Contribution

Multi-input multi-output (MIMO) formulation for improved hardware utilization

[73] Back to recurrent processing at the crossroad of transformers and state-space models PDF

Can Refute

[78] A survey on structured state space sequence (s4) models PDF

Can Refute

[71] Mambatrack: a simple baseline for multiple object tracking with state space model PDF

Cannot Refute

[72] In-context learned equalization in cell-free massive MIMO via state-space models PDF

Cannot Refute

[74] A Neural Network-Based Whittle Index Policy for Beam Resource Allocation in Multitarget Tracking PDF

Cannot Refute

[75] A Flexible Framework for Expectation Maximization-Based MIMO System Identification for Time-Variant Linear Acoustic Systems PDF

Cannot Refute

[76] Data-Aided CSI Estimation Using Affine-Precoded Superimposed Pilots in Orthogonal Time Frequency Space Modulated MIMO Systems PDF

Cannot Refute

[77] Sim-to-Real in Unmanned Surface Vehicle Control: A System Identification-Based Approach for Enhanced Training Environments PDF

Cannot Refute

[79] State space model realization using step response data of MIMO system with input delays for model predictive control PDF

Cannot Refute

[80] Identification of the deterministic part of MIMO state space models given in innovations form from input-output data PDF

Cannot Refute

Mamba-3: Improved Sequence Modeling using State Space Principles

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[42] Structured Linear CDEs: Maximally Expressive and Parallel-in-Time Sequence Models PDF

[43] The Curious Case of In-Training Compression of State Space Models PDF

[49] Mamba: Linear-Time Sequence Modeling with Selective State Spaces PDF

Contribution Analysis

Trapezoidal discretization for state-space models

[61] Fast Solvers for Discrete Diffusion Models: Theory and Applications of High-Order Algorithms PDF

[62] Detail Matters: Mamba-Inspired Joint Unfolding Network for Snapshot Spectral Compressive Imaging PDF

[63] A Damping-Free Method for Mitigation of Trapezoidal Rule Oscillations in Linear Systems PDF

[64] Supplement to 'The discretization filter: A simple way to estimate nonlinear state space models' PDF

[65] Comparative Analysis of State-Space and Companion-Circuit Methodologies for the Periodic Steady-State Solution in Time-Domain of Nonlinear Electric Networks PDF

[66] Fixed-rate modeling of audio lumped systems: A comparison between trapezoidal and implicit midpoint methods PDF

[67] A fast second-order accurate difference schemes for time distributed-order and Riesz space fractional diffusion equations PDF

[68] Modelling of nonlinear state-space systems using a deep neural network PDF

[69] A state-space-based implicit integration algorithm for differential-algebraic equations of multibody dynamics PDF

[70] Collision Avoidance using Iterative Dynamic and Nonlinear Programming with Adaptive Grid Refinements PDF

Complex-valued state update rule with data-dependent RoPE

[57] Wonderful Matrices: Combining for a More Efficient and Effective Foundation Model Architecture PDF

[51] VectorMamba: Enhancing point cloud analysis through vector representations and state space modeling PDF

[52] TransXSSM: A Hybrid Transformer State Space Model with Unified Rotary Position Embedding PDF

[53] State Space Models Naturally Produce Traveling Waves, Time Cells, and Scale to Abstract Cognitive Functions PDF

[54] Edge-Deployed Band-Split Rotary Position Encoding Transformer for Ultra-Low-Signal-to-Noise-Ratio Unmanned Aerial Vehicle Speech Enhancement PDF

[55] Incorporating sequential and geometric structure into deep neural networks PDF

[56] HoPE: Hyperbolic Rotary Positional Encoding for Stable Long-Range Dependency Modeling in Large Language Models PDF

[58] Equivariant Learning in Spatial Action Spaces PDF

[59] RotateCT: Knowledge Graph Embedding by Rotation and Coordinate Transformation in Complex Space PDF

[60] RRG-Mamba: Efficient Radiology Report Generation with State Space Model PDF

Multi-input multi-output (MIMO) formulation for improved hardware utilization

[73] Back to recurrent processing at the crossroad of transformers and state-space models PDF

[78] A survey on structured state space sequence (s4) models PDF

[71] Mambatrack: a simple baseline for multiple object tracking with state space model PDF

[72] In-context learned equalization in cell-free massive MIMO via state-space models PDF

[74] A Neural Network-Based Whittle Index Policy for Beam Resource Allocation in Multitarget Tracking PDF

[75] A Flexible Framework for Expectation Maximization-Based MIMO System Identification for Time-Variant Linear Acoustic Systems PDF

[76] Data-Aided CSI Estimation Using Affine-Precoded Superimposed Pilots in Orthogonal Time Frequency Space Modulated MIMO Systems PDF

[77] Sim-to-Real in Unmanned Surface Vehicle Control: A System Identification-Based Approach for Enhanced Training Environments PDF

[79] State space model realization using step response data of MIMO system with input delays for model predictive control PDF

[80] Identification of the deterministic part of MIMO state space models given in innovations form from input-output data PDF

Table of Contents