SSDi8: Accurate and Efficient 8-bit Quantization for State Space Duality

ICLR 2026 Conference SubmissionAnonymous Authors
Mamba-2State Space Duality (SSD)Quantization
Abstract:

Recent advances in sequence modeling have highlighted Mamba as a state space architecture offering efficient long-range dependency modeling and providing a viable alternative to Transformers. Building upon this, Mamba-2 introduces the Structured State Space Duality (SSD), which integrates recurrent and attention modes to achieve efficiency and scalability. However, this architectural expansion substantially increases memory and latency overhead, underscoring the need for efficient compression strategies tailored to SSD. In this work, we present SSDi8, the first post-training quantization framework specifically designed for SSD to maintain a persistent INT8 path. SSDi8 introduces a reformulation that decouples element-wise multiplications from matrix multiplications, enabling reuse of quantized activations across modules. Moreover, SSDi8 adaptively quantizes channel-varying activations at cost-effective points, further reducing latency. On the accuracy side, SSDi8 explicitly leverages the intrinsic dimensional decomposition of SSD, exploiting distinct outlier distributions across axes, and incorporates an error correction term based on per-channel error statistics. Comprehensive experiments demonstrate that SSDi8 achieves accuracy comparable to FP16 while delivering up to 1.4X speedup in W4A8 and W8A8 settings. We further validate its robustness in resource-constrained environments by deploying it on the Orin Nano device.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces SSDi8, a post-training quantization framework targeting Structured State Space Duality architectures with persistent INT8 paths. According to the taxonomy, it resides in the 'SSD-Specific INT8 Quantization Frameworks' leaf, which contains only two papers total. This leaf sits within the broader 'Selective State Space Model Quantization' branch, indicating a relatively sparse research direction focused on architecture-specific optimizations for SSD variants rather than general Mamba models.

The taxonomy reveals neighboring work in 'General Mamba Post-Training Quantization' (containing Quamba and Q-Mamba) and 'Small-Scale and Edge-Optimized SSM Quantization' (Quantizing Edge SSM). These sibling leaves address post-training compression for Mamba variants but differ in scope: general Mamba methods handle multi-bit precision without SSD-specific optimizations, while edge-focused work prioritizes resource constraints over architectural duality. The taxonomy explicitly excludes quantization-aware training methods and vision-specific Mamba variants, positioning SSDi8 within a narrower post-training context.

Among 21 candidates examined, the framework's core contribution (SSDi8 PTQ for SSD) shows no clear refutation across 10 candidates reviewed. The sparse-aware reformulation examined only 1 candidate with no overlap found. However, the error correction mechanism based on per-channel statistics encountered 2 refutable candidates among 10 examined, suggesting this component has more substantial prior work in quantization literature. The limited search scope means these statistics reflect top semantic matches rather than exhaustive coverage.

Given the sparse taxonomy leaf and limited refutation signals, the work appears to occupy a relatively novel position within SSD-specific quantization. The analysis covers top-21 semantic candidates and does not claim exhaustive field coverage. The error correction component shows more overlap with existing techniques, while the SSD-tailored framework and reformulation appear more distinctive within the examined scope.

Taxonomy

Core-task Taxonomy Papers
9
3
Claimed Contributions
21
Contribution Candidate Papers Compared
2
Refutable Paper

Research Landscape Overview

Core task: 8-bit quantization for Structured State Space Duality models. The field organizes around three main branches that reflect different stages and scopes of model compression. Post-Training Quantization Methods for State Space Models focus on reducing precision after training is complete, with works like Quamba[2] and Q-Mamba[3] developing selective strategies that identify which components of state space architectures are most sensitive to quantization. Quantization-Aware Training and Co-Design Approaches integrate precision reduction directly into the training loop or explore joint optimization of architecture and bit-width, as seen in methods like QAT Survey[7] and AutoNeural[8]. Specialized Applications and Hybrid Architectures address domain-specific constraints or combine state space models with other paradigms, exemplified by TVMamba[9] and Spikingbrain[5], which adapt quantization to particular deployment contexts or neuromorphic settings. Recent activity has concentrated on refining post-training methods that balance compression ratio against the unique computational patterns of selective state space models. A central tension emerges between aggressive uniform quantization and more nuanced layer-wise or context-sensitive schemes, with Context-Aware Quantization[6] exploring adaptive strategies and Slender-Mamba[4] pursuing extreme sparsity alongside reduced precision. SSDi8[0] sits squarely within the selective state space model quantization cluster, proposing an INT8 framework tailored to the duality structure of SSD architectures. Compared to broader approaches like Q-Mamba[3], which targets general Mamba variants, SSDi8[0] emphasizes architecture-specific optimizations that exploit the mathematical properties of duality layers. Meanwhile, Quantizing Edge SSM[1] addresses similar post-training challenges but prioritizes edge deployment constraints, highlighting an open question about whether specialized frameworks or unified methods will prove more effective as state space models diversify.

Claimed Contributions

SSDi8 post-training quantization framework for SSD

The authors introduce SSDi8, a novel post-training quantization framework tailored for Structured State Space Duality (SSD) in Mamba-2. This framework maintains a persistent INT8 execution path through the SSD architecture, addressing the unique computational organization and challenges of quantizing SSD layers.

10 retrieved papers
Sparse-aware reformulation for element-wise operations

The authors propose a sparse-aware reformulation that separates element-wise multiplications from matrix multiplications within SSD. This reformulation enables quantized activation reuse across multiple modules and maintains the INT8 execution path, with formal mathematical guarantees provided through theoretical analysis.

1 retrieved paper
Error correction based on per-channel statistics

The authors introduce a mean correction strategy that compensates for quantization errors using per-channel error statistics. This correction term is derived in closed form and applied through a layer-wise sequential update strategy to mitigate error accumulation across SSD layers.

10 retrieved papers
Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

SSDi8 post-training quantization framework for SSD

The authors introduce SSDi8, a novel post-training quantization framework tailored for Structured State Space Duality (SSD) in Mamba-2. This framework maintains a persistent INT8 execution path through the SSD architecture, addressing the unique computational organization and challenges of quantizing SSD layers.

Contribution

Sparse-aware reformulation for element-wise operations

The authors propose a sparse-aware reformulation that separates element-wise multiplications from matrix multiplications within SSD. This reformulation enables quantized activation reuse across multiple modules and maintains the INT8 execution path, with formal mathematical guarantees provided through theoretical analysis.

Contribution

Error correction based on per-channel statistics

The authors introduce a mean correction strategy that compensates for quantization errors using per-channel error statistics. This correction term is derived in closed form and applied through a layer-wise sequential update strategy to mitigate error accumulation across SSD layers.