VADv2: End-to-End Autonomous Driving via Probabilistic Planning
Overview
Overall Novelty Assessment
The paper proposes VADv2, an end-to-end autonomous driving system that models planning as a probabilistic field function over discretized action tokens. It resides in the 'Probabilistic Action Distribution Models' leaf under 'End-to-End Learning Architectures', which contains only two papers total. This sparse population suggests the specific approach of tokenizing continuous spatiotemporal action spaces into discrete vocabularies for probabilistic planning remains relatively unexplored. The taxonomy reveals this is one focused direction within a broader landscape of end-to-end methods, contrasting with neighboring leaves like 'Generative Planning Models' and 'Multimodal Foundation Model Integration' that pursue different architectural strategies.
The taxonomy structure shows VADv2 sits adjacent to several related but distinct research directions. Neighboring leaves include 'Generative Planning Models' using diffusion or GANs for trajectory distributions, 'Deterministic End-to-End Models' that regress actions without probabilistic modeling, and 'Uncertainty-Aware Representation Learning' focusing on world models. The broader 'Modular Planning Frameworks' branch offers an alternative philosophy with explicit perception-prediction-planning separation. VADv2's positioning emphasizes learning probabilistic distributions directly from demonstrations within a unified architecture, diverging from both deterministic regression approaches and modular systems that maintain structured intermediate representations for interpretability.
Among the thirty candidates examined, the 'VADv2 end-to-end driving model with action space tokenization' contribution shows the most substantial prior work overlap, with three refutable candidates identified from ten examined. The other two contributions—the probabilistic planning paradigm and benchmark performance claims—found no clear refutations among their respective ten-candidate searches. This pattern suggests the core architectural innovation of action space tokenization has more direct precedents in the limited search scope, while the broader framing as probabilistic planning and the empirical results appear less directly challenged. The analysis explicitly covers top-K semantic matches and citation expansion, not an exhaustive literature review.
Based on the limited thirty-candidate search, VADv2 appears to occupy a sparsely populated research direction with one sibling paper in its taxonomy leaf. The tokenization approach shows measurable prior work within the examined scope, while the probabilistic planning framing and performance claims lack clear refutations among candidates reviewed. The taxonomy context reveals this work contributes to ongoing debates between monolithic learned models and hybrid systems, though the search scope cannot definitively assess novelty across all related end-to-end or modular planning literature.
Taxonomy
Research Landscape Overview
Claimed Contributions
The authors introduce a probabilistic planning approach that models the planning policy as a scene-conditioned nonstationary stochastic process p(a|o), using a probabilistic field function to map actions to probability distributions. This addresses the uncertainty inherent in planning by learning from large-scale driving demonstrations, unlike deterministic methods that directly regress actions.
The authors present VADv2, which discretizes the high-dimensional continuous planning action space into a planning vocabulary, tokenizes both sensor data and planning actions, uses Transformer-based interaction between planning tokens and scene tokens, and samples actions from the learned probability distribution for vehicle control.
The authors demonstrate that VADv2 achieves state-of-the-art results on CARLA Town05, NAVSIM, and a 3DGS-based benchmark in both closed-loop and open-loop evaluation settings, with validation through extensive simulations and real-world deployment showing effectiveness and stability.
Core Task Comparisons
Comparisons with papers in the same taxonomy category
[1] VADv2: End-to-End Vectorized Autonomous Driving via Probabilistic Planning PDF
Contribution Analysis
Detailed comparisons for each claimed contribution
Probabilistic planning paradigm for end-to-end autonomous driving
The authors introduce a probabilistic planning approach that models the planning policy as a scene-conditioned nonstationary stochastic process p(a|o), using a probabilistic field function to map actions to probability distributions. This addresses the uncertainty inherent in planning by learning from large-scale driving demonstrations, unlike deterministic methods that directly regress actions.
[1] VADv2: End-to-End Vectorized Autonomous Driving via Probabilistic Planning PDF
[3] Planning for Autonomous Driving via Interaction-Aware Probabilistic Action Policies PDF
[51] Visually-guided motion planning for autonomous driving from interactive demonstrations PDF
[52] Learning Autonomous Vehicle Safety Concepts from Demonstrations PDF
[53] A survey on imitation learning techniques for end-to-end autonomous vehicles PDF
[54] Learning-enabled decision-making for autonomous driving: framework and methodology PDF
[55] Safe and Interpretable Human-Like Planning With Transformer-Based Deep Inverse Reinforcement Learning for Autonomous Driving PDF
[56] Improved deep reinforcement learning with expert demonstrations for urban autonomous driving PDF
[57] How To Not Drive: Learning Driving Constraints from Demonstration PDF
[58] Deep reinforcement learning based local planner for UAV obstacle avoidance using demonstration data PDF
VADv2 end-to-end driving model with action space tokenization
The authors present VADv2, which discretizes the high-dimensional continuous planning action space into a planning vocabulary, tokenizes both sensor data and planning actions, uses Transformer-based interaction between planning tokens and scene tokens, and samples actions from the learned probability distribution for vehicle control.
[60] Behavior Generation with Latent Actions PDF
[62] Drivinggpt: Unifying driving world modeling and planning with multi-modal autoregressive transformers PDF
[63] Doe-1: Closed-Loop Autonomous Driving with Large World Model PDF
[59] Flow Matching-Based Autonomous Driving Planning with Advanced Interactive Behavior Modeling PDF
[61] ADAPT: Action-aware Driving Caption Transformer PDF
[64] Smart: Scalable multi-agent real-time motion generation via next-token prediction PDF
[65] Categorical traffic transformer: Interpretable and diverse behavior prediction with tokenized latent PDF
[66] Vision-Language Cross-Attention for Real-Time Autonomous Driving PDF
[67] Tokenize the world into object-level knowledge to address long-tail events in autonomous driving PDF
[68] Is a 3D-Tokenized LLM the Key to Reliable Autonomous Driving? PDF
State-of-the-art planning performance across multiple benchmarks
The authors demonstrate that VADv2 achieves state-of-the-art results on CARLA Town05, NAVSIM, and a 3DGS-based benchmark in both closed-loop and open-loop evaluation settings, with validation through extensive simulations and real-world deployment showing effectiveness and stability.