AutoDrive-R²: Incentivizing Reasoning and Self-Reflection Capacity for VLA Model in Autonomous Driving

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

ApplicationsRobotsVision–Language–Action Models

Vision–Language–Action (VLA) models in autonomous driving systems have recently demonstrated transformative potential by integrating multimodal perception with decision-making capabilities. However, the interpretability and coherence of the decision process and the plausibility of action sequences remain largely underexplored. To address these issues, we propose AutoDrive-R², a novel VLA framework that enhances both reasoning and self-reflection capabilities of autonomous driving systems through chain-of-thought (CoT) processing and reinforcement learning (RL). Specifically, we first propose an innovative CoT dataset named nuScenesR²-6K for supervised fine-tuning, which effectively builds cognitive bridges between input information and output trajectories through a four-step logical chain with self-reflection for validation. Moreover, to maximize both reasoning and self-reflection during the RL stage, we further employ the Group Relative Policy Optimization (GRPO) algorithm within a physics-grounded reward framework that incorporates spatial alignment, vehicle dynamic, and temporal smoothness criteria to ensure reliable and realistic trajectory planning. Extensive evaluation results across both nuScenes and Waymo datasets demonstrates the state-of-the-art performance and robust generalization capacity of our proposed method.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes AutoDrive-R², a VLA framework combining chain-of-thought reasoning with self-reflection for autonomous driving. It resides in the Chain-of-Thought Reasoning leaf, which contains four papers total, indicating a moderately populated research direction. This leaf sits within the broader Reasoning Enhancement Mechanisms branch, which encompasses multiple reasoning paradigms including counterfactual analysis and adaptive strategies. The framework's dual emphasis on reasoning and self-reflection positions it at the intersection of structured cognitive processing and validation mechanisms.

The taxonomy reveals that Chain-of-Thought Reasoning neighbors Counterfactual and Self-Reflective Reasoning (two papers) and Adaptive Reasoning Strategies (two papers), suggesting the field is exploring diverse approaches to interpretable decision-making. The broader Multimodal Integration Architectures branch addresses complementary challenges like spatial awareness and unified perception-action frameworks. AutoDrive-R²'s physics-grounded reward framework connects it to Training Paradigms and Optimization, particularly Reinforcement Learning and Online Optimization, indicating cross-cutting methodological contributions beyond pure reasoning architecture.

Among thirty candidates examined, the core VLA framework contribution shows potential overlap with two prior works, while the nuScenesR²-6K dataset and physics-grounded GRPO method each examined ten candidates with no clear refutations. The dataset contribution appears more distinctive, as no examined work provides a comparable four-step logical chain with self-reflection annotations. The GRPO method's novelty is less clear given the limited search scope, though the specific combination of spatial alignment, vehicle dynamics, and temporal smoothness criteria may differentiate it from existing RL approaches in this domain.

Based on top-thirty semantic matches, the framework's reasoning architecture faces some prior work overlap, while the dataset and training methodology appear more novel within the examined scope. The analysis does not cover exhaustive literature on general VLA models or broader autonomous driving systems, focusing specifically on reasoning-enhanced approaches. The taxonomy structure suggests this is an active but not overcrowded research area, with room for contributions that meaningfully advance interpretability and self-correction capabilities.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Enhancing reasoning and self-reflection in vision-language-action models for autonomous driving. The field has evolved into several distinct branches that address different facets of building intelligent driving agents. Reasoning Enhancement Mechanisms explore how to inject structured thought processes—such as chain-of-thought or reflective loops—into model predictions, enabling systems to articulate intermediate steps before committing to actions. Multimodal Integration Architectures focus on fusing visual, linguistic, and sometimes spatial or temporal cues within unified frameworks, as seen in works like CoVLA[3] and OmniReason[4]. Training Paradigms and Optimization investigate learning strategies, from imitation and reinforcement learning to self-supervised techniques that leverage large-scale driving data. Datasets and Benchmarks provide standardized evaluation protocols, while Specialized Applications and Extensions tackle domain-specific challenges like safety-critical scenarios or real-time deployment. Surveys and Conceptual Frameworks, including A Survey on Vision-Language-Action[1], offer high-level perspectives on the landscape, and Direct Vision-Action Mapping examines end-to-end approaches that bypass explicit linguistic reasoning. Within Reasoning Enhancement Mechanisms, a particularly active line of work centers on chain-of-thought reasoning, where models generate intermediate rationales to improve decision transparency and robustness. AutoDrive-R²[0] exemplifies this direction by emphasizing both reasoning and self-reflection, aiming to produce interpretable driving decisions that can be scrutinized and refined. Nearby efforts such as CoT4AD[35] and CoC-VLA[40] similarly adopt structured reasoning chains but may differ in how they balance computational overhead against interpretability gains. A key trade-off across these methods is whether to prioritize explicit linguistic explanations—which enhance human trust and debugging—or to streamline inference for real-time performance. AutoDrive-R²[0] sits squarely in the chain-of-thought cluster, sharing the goal of transparent reasoning with CoT4AD[35] while potentially exploring deeper self-correction loops that distinguish it from simpler one-pass chain-of-thought approaches. This positioning highlights ongoing questions about how much reasoning depth is necessary for safe, reliable autonomous driving.

Claimed Contributions

AutoDrive-R2 VLA framework with reasoning and self-reflection

Can Refute

10 retrieved papers

The authors propose a Vision-Language-Action framework that enhances autonomous driving by incorporating chain-of-thought reasoning and self-reflection capabilities, enabling the system to generate physically feasible trajectories while providing interpretable decision-making processes.

10 retrieved papers

Can Refute

nuScenesR2-6K chain-of-thought dataset

10 retrieved papers

The authors introduce the first autonomous driving dataset that includes not only ground-truth trajectories but also structured reasoning steps through a four-step logical chain (observation, calculation, logical deductions, reflection) to train models with both reasoning and self-reflection capabilities.

10 retrieved papers

Physics-grounded GRPO reinforcement learning method

10 retrieved papers

The authors develop a reinforcement learning approach using Group Relative Policy Optimization with a physics-grounded reward framework that incorporates spatial alignment, vehicle dynamics, and temporal smoothness constraints to ensure physically feasible and realistic trajectory planning.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[8] AutoDrive-R $^2$ : Incentivizing Reasoning and Self-Reflection Capacity for VLA Model in Autonomous Driving PDF

Yuan Zhen-long, Tang Jing, Zhenlong Yuan, Luo Jinguo, Jing Tang, Chen Rui, Jinguo Luo, Rui Chen, Sun Lei, Chengxuan Qian, Chu, Xiangxiang, Lei Sun, Cai Yujun, Xiangxiang Chu, Zhang DaPeng, Yujun Cai, Li Shuo, Dapeng Zhang, Shuo Li (2025)

[35] CoT4AD: A Vision-Language-Action Model with Explicit Chain-of-Thought Reasoning for Autonomous Driving PDF

Zhaohui Wang, Tengbo Yu, Hao Tang (2025)

[40] CoC-VLA: Delving into Adversarial Domain Transfer for Explainable Autonomous Driving via Chain-of-Causality Visual-Language-Action Model PDF

Dapeng Zhang, Fei Shen, Rui Zhao, Yinda Chen, Peng Zhi, Chenyang Li, Rui Zhou, Qingguo Zhou (2025)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

AutoDrive-R2 VLA framework with reasoning and self-reflection

[13] Alpamayo-r1: Bridging reasoning and action prediction for generalizable autonomous driving in the long tail PDF

Can Refute

[26] AutoVLA: A Vision-Language-Action Model for End-to-End Autonomous Driving with Adaptive Reasoning and Reinforcement Fine-Tuning PDF

Can Refute

[1] A Survey on Vision-Language-Action Models for Autonomous Driving PDF

Cannot Refute

[2] Discrete Diffusion for Reflective Vision-Language-Action Models in Autonomous Driving PDF

Cannot Refute

[8] AutoDrive-R $^2$ : Incentivizing Reasoning and Self-Reflection Capacity for VLA Model in Autonomous Driving PDF

Cannot Refute

[11] FutureSightDrive: Thinking Visually with Spatio-Temporal CoT for Autonomous Driving PDF

Cannot Refute

[55] Multimodal chain-of-thought reasoning: A comprehensive survey PDF

Cannot Refute

[56] AgentThink: A Unified Framework for Tool-Augmented Chain-of-Thought Reasoning in Vision-Language Models for Autonomous Driving PDF

Cannot Refute

[57] Vlr-driver: Large vision-language-reasoning models for embodied autonomous driving PDF

Cannot Refute

[58] ReasonPlan: Unified Scene Prediction and Decision Reasoning for Closed-loop Autonomous Driving PDF

Cannot Refute

Contribution

nuScenesR2-6K chain-of-thought dataset

[8] AutoDrive-R $^2$ : Incentivizing Reasoning and Self-Reflection Capacity for VLA Model in Autonomous Driving PDF

Cannot Refute

[9] A Language Agent for Autonomous Driving PDF

Cannot Refute

[11] FutureSightDrive: Thinking Visually with Spatio-Temporal CoT for Autonomous Driving PDF

Cannot Refute

[59] Cot-drive: Efficient motion forecasting for autonomous driving with llms and chain-of-thought prompting PDF

Cannot Refute

[60] Receive, reason, and react: Drive as you say, with large language models in autonomous vehicles PDF

Cannot Refute

[61] Chain-of-thought is not explainability PDF

Cannot Refute

[62] Bench2ADVLM: a closed-loop benchmark for vision-language models in autonomous driving PDF

Cannot Refute

[63] Drivecot: Integrating chain-of-thought reasoning with end-to-end driving PDF

Cannot Refute

[64] Large language model based system with causal inference and Chain-of-Thoughts reasoning for traffic scene risk assessment PDF

Cannot Refute

[65] Planagent: A multi-modal large language agent for closed-loop vehicle motion planning PDF

Cannot Refute

Contribution

Physics-grounded GRPO reinforcement learning method

[8] AutoDrive-R $^2$ : Incentivizing Reasoning and Self-Reflection Capacity for VLA Model in Autonomous Driving PDF

Cannot Refute

[46] Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving PDF

Cannot Refute

[47] Safe reinforcement learning with policy-guided planning for autonomous driving PDF

Cannot Refute

[48] Deep Reinforcement Learning for Autonomous Driving PDF

Cannot Refute

[49] Stability-Aware Reinforcement Learning for Autonomous Driving With Dynamics-Augmented State and Lyapunov Constraints PDF

Cannot Refute

[50] Constraints Driven Safe Reinforcement Learning for Autonomous Driving Decision-Making PDF

Cannot Refute

[51] UCRLF: unified constrained reinforcement learning framework for phase-aware architectures for autonomous vehicle signaling and trajectory optimization PDF

Cannot Refute

[52] Physics Enhanced Residual Policy Learning (PERPL) for safety cruising in mixed traffic platooning under actuator and communication delay PDF

Cannot Refute

[53] Design of Reward Functions for RL-based High-Speed Autonomous Driving PDF

Cannot Refute

[54] Self-learned autonomous driving at unsignalized intersections: A hierarchical reinforced learning approach for feasible decision-making PDF

Cannot Refute

AutoDrive-R²: Incentivizing Reasoning and Self-Reflection Capacity for VLA Model in Autonomous Driving

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[8] AutoDrive-R2^22: Incentivizing Reasoning and Self-Reflection Capacity for VLA Model in Autonomous Driving PDF

[35] CoT4AD: A Vision-Language-Action Model with Explicit Chain-of-Thought Reasoning for Autonomous Driving PDF

[40] CoC-VLA: Delving into Adversarial Domain Transfer for Explainable Autonomous Driving via Chain-of-Causality Visual-Language-Action Model PDF

Contribution Analysis

AutoDrive-R2 VLA framework with reasoning and self-reflection

[13] Alpamayo-r1: Bridging reasoning and action prediction for generalizable autonomous driving in the long tail PDF

[26] AutoVLA: A Vision-Language-Action Model for End-to-End Autonomous Driving with Adaptive Reasoning and Reinforcement Fine-Tuning PDF

[1] A Survey on Vision-Language-Action Models for Autonomous Driving PDF

[2] Discrete Diffusion for Reflective Vision-Language-Action Models in Autonomous Driving PDF

[8] AutoDrive-R2^22: Incentivizing Reasoning and Self-Reflection Capacity for VLA Model in Autonomous Driving PDF

[11] FutureSightDrive: Thinking Visually with Spatio-Temporal CoT for Autonomous Driving PDF

[55] Multimodal chain-of-thought reasoning: A comprehensive survey PDF

[56] AgentThink: A Unified Framework for Tool-Augmented Chain-of-Thought Reasoning in Vision-Language Models for Autonomous Driving PDF

[57] Vlr-driver: Large vision-language-reasoning models for embodied autonomous driving PDF

[58] ReasonPlan: Unified Scene Prediction and Decision Reasoning for Closed-loop Autonomous Driving PDF

nuScenesR2-6K chain-of-thought dataset

[8] AutoDrive-R2^22: Incentivizing Reasoning and Self-Reflection Capacity for VLA Model in Autonomous Driving PDF

[9] A Language Agent for Autonomous Driving PDF

[11] FutureSightDrive: Thinking Visually with Spatio-Temporal CoT for Autonomous Driving PDF

[59] Cot-drive: Efficient motion forecasting for autonomous driving with llms and chain-of-thought prompting PDF

[60] Receive, reason, and react: Drive as you say, with large language models in autonomous vehicles PDF

[61] Chain-of-thought is not explainability PDF

[62] Bench2ADVLM: a closed-loop benchmark for vision-language models in autonomous driving PDF

[63] Drivecot: Integrating chain-of-thought reasoning with end-to-end driving PDF

[64] Large language model based system with causal inference and Chain-of-Thoughts reasoning for traffic scene risk assessment PDF

[65] Planagent: A multi-modal large language agent for closed-loop vehicle motion planning PDF

Physics-grounded GRPO reinforcement learning method

[8] AutoDrive-R2^22: Incentivizing Reasoning and Self-Reflection Capacity for VLA Model in Autonomous Driving PDF

[46] Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving PDF

[47] Safe reinforcement learning with policy-guided planning for autonomous driving PDF

[48] Deep Reinforcement Learning for Autonomous Driving PDF

[49] Stability-Aware Reinforcement Learning for Autonomous Driving With Dynamics-Augmented State and Lyapunov Constraints PDF

[50] Constraints Driven Safe Reinforcement Learning for Autonomous Driving Decision-Making PDF

[51] UCRLF: unified constrained reinforcement learning framework for phase-aware architectures for autonomous vehicle signaling and trajectory optimization PDF

[52] Physics Enhanced Residual Policy Learning (PERPL) for safety cruising in mixed traffic platooning under actuator and communication delay PDF

[53] Design of Reward Functions for RL-based High-Speed Autonomous Driving PDF

[54] Self-learned autonomous driving at unsignalized intersections: A hierarchical reinforced learning approach for feasible decision-making PDF

Table of Contents

[8] AutoDrive-R $^2$ : Incentivizing Reasoning and Self-Reflection Capacity for VLA Model in Autonomous Driving PDF

[8] AutoDrive-R $^2$ : Incentivizing Reasoning and Self-Reflection Capacity for VLA Model in Autonomous Driving PDF

[8] AutoDrive-R $^2$ : Incentivizing Reasoning and Self-Reflection Capacity for VLA Model in Autonomous Driving PDF

[8] AutoDrive-R $^2$ : Incentivizing Reasoning and Self-Reflection Capacity for VLA Model in Autonomous Driving PDF