DriveMamba: Task-Centric Scalable State Space Model for Efficient End-to-End Autonomous Driving
Overview
Overall Novelty Assessment
The paper proposes DriveMamba, a unified end-to-end autonomous driving framework that integrates perception, prediction, and planning using Mamba state space models. According to the taxonomy, this work resides in the 'Mamba-Based Unified End-to-End Driving' leaf, which currently contains no sibling papers. This leaf sits within the broader 'State Space Model Architectures for Perception and Planning' branch, which includes only four total papers across four distinct leaves. The sparse population of this branch suggests that applying Mamba architectures to unified end-to-end driving is an emerging and relatively unexplored research direction within the field.
The taxonomy reveals that neighboring research directions include 'Mamba for Temporal BEV Perception' (BevMamba), 'Mamba for Multi-Modal Video Understanding', and 'Trajectory Prediction with Selective State Spaces', each focusing on specific subtasks rather than unified end-to-end systems. The broader field shows substantial activity in 'Latent World Model-Based End-to-End Driving' (seven papers across four leaves) and 'Deep Reinforcement Learning for End-to-End Driving' (five papers across five leaves). DriveMamba diverges from these directions by eschewing explicit latent dynamics modeling or pure RL optimization in favor of direct state space sequential processing across all driving tasks simultaneously.
Among the three identified contributions, the literature search examined 28 total candidates with no refutable pairs found. Contribution A (Task-Centric Scalable paradigm) examined eight candidates with zero refutations; Contribution B (bidirectional trajectory-guided scan) examined ten candidates with zero refutations; Contribution C (Unified Mamba decoder) examined ten candidates with zero refutations. This limited search scope—covering top-K semantic matches rather than exhaustive field coverage—suggests that within the examined candidate pool, no prior work directly overlaps with the proposed technical innovations. However, the small search scale means substantial related work may exist outside the examined set.
Given the sparse taxonomy leaf (zero siblings) and the absence of refutations among 28 examined candidates, the work appears to occupy a relatively novel position within the limited search scope. The integration of Mamba state space models into a single-stage unified driving architecture represents a distinct approach compared to the examined latent world model and RL-based methods. However, the analysis is constrained by the top-K semantic search methodology and does not constitute an exhaustive literature review, leaving open the possibility of relevant prior work in adjacent research communities or recent preprints.
Taxonomy
Research Landscape Overview
Claimed Contributions
The authors introduce DriveMamba, a novel end-to-end autonomous driving framework that uses a unified Mamba decoder to simultaneously model task relations, learn view correspondences, and fuse temporal information. This paradigm operates on sparse token-level representations rather than dense BEV features, enabling efficient and scalable processing with linear complexity.
The authors design a hybrid spatiotemporal scanning method that organizes tokens based on their 3D positions and ego-vehicle trajectory. This scan preserves spatial locality from the ego-vehicle perspective and captures task-related dependencies in a manner suited for interactive planning.
The authors propose a unified decoder architecture based on bidirectional Mamba blocks that processes task queries and sensor tokens in parallel. This design enables dynamic task relation modeling without manual sequential ordering, supporting scalability through simple layer stacking with linear complexity.
Core Task Comparisons
Comparisons with papers in the same taxonomy category
Contribution Analysis
Detailed comparisons for each claimed contribution
DriveMamba: Task-Centric Scalable State Space Model paradigm for E2E-AD
The authors introduce DriveMamba, a novel end-to-end autonomous driving framework that uses a unified Mamba decoder to simultaneously model task relations, learn view correspondences, and fuse temporal information. This paradigm operates on sparse token-level representations rather than dense BEV features, enabling efficient and scalable processing with linear complexity.
[70] Efficient Long-Range Context Modeling for Motion Forecasting with State Space Models PDF
[71] Foresight in Motion: Reinforcing Trajectory Prediction with Reward Heuristics PDF
[72] SpaceDrive: Infusing Spatial Awareness into VLM-based Autonomous Driving PDF
[73] Multi-Agent Motion Forecasting via Mixed Supervision PDF
[74] Trajectory Mamba: Efficient Attention-Mamba Forecasting Model Based on Selective SSM PDF
[75] Online Joint State Inference and Learning of Partially Unknown State-Space Models PDF
[76] A Comprehensive Survey on World Models for Embodied AI PDF
[77] World-Model based Hierarchical Planning with Semantic Communications for Autonomous Driving PDF
Bidirectional trajectory-guided local-to-global scan method
The authors design a hybrid spatiotemporal scanning method that organizes tokens based on their 3D positions and ego-vehicle trajectory. This scan preserves spatial locality from the ego-vehicle perspective and captures task-related dependencies in a manner suited for interactive planning.
[51] Occworld: Learning a 3d occupancy world model for autonomous driving PDF
[61] Parting with Misconceptions about Learning-based Vehicle Motion Planning PDF
[62] Genad: Generative end-to-end autonomous driving PDF
[63] Autonomous Vehicle Motion Planning PDF
[64] Search-Based Task and Motion Planning for Hybrid Systems: Agile Autonomous Vehicles PDF
[65] Collaborative Motion Planning Based on the Improved Ant Colony Algorithm for Multiple Autonomous Vehicles PDF
[66] Set-based trajectory planning for a car-like vehicle PDF
[67] Three-Dimensional Flight Corridor: An Occupancy Checking Process for Unmanned Aerial Vehicle Motion Planning inside Confined Spaces PDF
[68] An RRT-Dijkstra-Based Path Planning Strategy for Autonomous Vehicles PDF
[69] Agile Decision-Making and Safety-Critical Motion Planning for Emergency Autonomous Vehicles PDF
Unified Mamba decoder for parallel task modeling
The authors propose a unified decoder architecture based on bidirectional Mamba blocks that processes task queries and sensor tokens in parallel. This design enables dynamic task relation modeling without manual sequential ordering, supporting scalability through simple layer stacking with linear complexity.