SafeMVDrive: Multi-view Safety-Critical Driving Video Generation in the Real World Domain

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

Autonomous driving testingsafety-critical scenariovideo generationsafety

Safety-critical scenarios are essential for evaluating autonomous driving (AD) systems, yet they are rare in practice. Existing generators produce trajectories, simulations, or single-view videos—but they don’t meet what modern AD systems actually consume: realistic multi-view video. We present SafeMVDrive, the first framework for generating multi-view safety-critical driving videos in the real-world domain. SafeMVDrive couples a safety-critical trajectory engine with a diffusion-based multi-view video generator through three design choices. First, we pick the right adversary: a GRPO-fine-tuned vision-language model (VLM) that understands multi-camera context and selects vehicles most likely to induce hazards. Second, we generate the right motion: a two-stage trajectory process that (i) produces collisions, then (ii) transforms them into natural evasion trajectories—preserving risk while staying within what current video generators can faithfully render. Third, we synthesize the right data: a diffusion model that turns these trajectories into multi-view videos suitable for end-to-end planners. On a strong end-to-end planner, our videos substantially increase collision rate, exposing brittle behavior and providing targeted stress tests for planning modules. Our code and video examples are available at: https://iclr-1.github.io/SMD/.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

SafeMVDrive introduces a framework coupling adversarial trajectory generation with diffusion-based multi-view video synthesis for safety-critical driving scenarios. It resides in the Safety-Critical Scenario Video Synthesis leaf, which contains four papers including the original work. This leaf represents a focused research direction within the broader Multi-View Driving Video Generation Frameworks branch. The taxonomy reveals this is a moderately populated niche: while general multi-view generation has multiple sub-directions (layout-guided, world-model-based, instance-aware), the safety-critical video synthesis cluster remains relatively compact, suggesting an emerging rather than saturated area.

The taxonomy positions SafeMVDrive adjacent to General Multi-View Video Generation methods emphasizing layout control and world modeling, plus Safety-Critical Data Synthesis approaches using simulation pipelines or LLM-based scripting. The scope notes clarify boundaries: SafeMVDrive differs from general multi-view generators by explicitly targeting adversarial scenarios, and from simulation-based methods by producing real-world-domain videos rather than synthetic simulator outputs. Neighboring leaves like Controllable Multi-View Video Generation share controllability goals but lack the safety-critical focus. This structural context suggests SafeMVDrive bridges video synthesis realism with hazard-injection control, occupying a distinct position between photorealistic generation and safety-oriented data augmentation.

Among seventeen candidates examined, no contribution was clearly refuted. The core SafeMVDrive framework examined ten candidates with zero refutations; the VLM-based adversarial selector examined two candidates with none overlapping; the two-stage trajectory generator examined five candidates with none providing prior work. These statistics reflect a limited semantic search scope, not exhaustive coverage. The absence of refutations across all contributions suggests either genuine novelty within the examined set or that closely related prior work lies outside the top-seventeen semantic matches. The VLM-based selector and trajectory generator appear particularly under-explored in the candidate pool, though this may reflect search limitations rather than absolute novelty.

Given the limited search scope of seventeen candidates, the analysis indicates SafeMVDrive occupies a sparsely populated intersection of multi-view video generation and safety-critical scenario synthesis. The taxonomy structure confirms this is an emerging direction with few direct competitors in the examined literature. However, the small candidate pool and absence of refutations warrant caution: a broader search might reveal closer prior work, particularly in adjacent simulation-based or controllability-focused methods not captured by semantic similarity.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: multi-view safety-critical driving video generation. The field has evolved around several complementary branches. Multi-View Driving Video Generation Frameworks encompass methods that synthesize realistic driving scenes from multiple camera perspectives, often leveraging diffusion models or neural rendering techniques to produce temporally coherent outputs. Safety-Critical Data Synthesis and Simulation focuses on generating rare or dangerous scenarios—such as near-collisions or adverse weather—that are underrepresented in real-world datasets, enabling robust testing of autonomous systems. Perception and Reasoning Systems for Autonomous Driving address how models interpret and reason about generated or real video streams, while Specialized Applications and Adaptation explore domain-specific tasks like pedestrian editing or traffic-light recognition. Survey and Review Literature provides broader context on generalization challenges and physical-world deployment. Representative works include DrivingDiffusion[8] and InstaDrive[3] for general multi-view synthesis, and SafeMVDrive Synthesis[1] for safety-focused generation. Recent efforts reveal a tension between photorealism and controllability: some approaches prioritize high-fidelity rendering using world models like Gaia[6] or Cosmos Drive Dreams[5], while others such as DrivingGen[9] and Challenger[10] emphasize structured control over safety-critical events. SafeMVDrive[0] sits within the Safety-Critical Scenario Video Synthesis cluster, closely aligned with SafeMVDrive Synthesis[1] and Challenger[10], which similarly target rare-event generation. Compared to InstaDrive[3], which focuses on instant multi-view synthesis without explicit safety constraints, SafeMVDrive[0] places greater emphasis on controllable hazard injection and multi-camera consistency under critical conditions. Open questions remain around balancing realism with the diversity of edge cases, and how best to integrate these synthetic scenarios into end-to-end training pipelines for perception and planning.

Claimed Contributions

SafeMVDrive framework for multi-view safety-critical video generation

10 retrieved papers

The authors introduce SafeMVDrive, a novel framework that couples a safety-critical trajectory engine with a diffusion-based multi-view video generator to produce realistic multi-view safety-critical driving videos suitable for evaluating end-to-end autonomous driving systems.

10 retrieved papers

VLM-based adversarial vehicle selector with visual context

2 retrieved papers

The authors propose a GRPO-fine-tuned vision-language model that leverages multi-view camera images to identify adversarial vehicles most likely to induce safety-critical scenarios, addressing limitations of prior non-visual heuristic selection methods.

2 retrieved papers

Two-stage collision-evasion trajectory generator

5 retrieved papers

The authors develop a two-stage trajectory generation process that first creates collision trajectories and then refines them into natural evasion trajectories, preserving safety-critical characteristics while remaining compatible with current video generation models that cannot realistically render collisions.

5 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[1] SafeMVDrive: Multi-view Safety-Critical Driving Video Synthesis in the Real World Domain PDF

Zhou Jiawei, Tian, Zhuotao, Zhuo, Cheng, Li Yu (2025)

[9] Drivinggen: Efficient safety-critical driving video generation with latent diffusion models PDF

Zipeng Guo, Yuchen Zhou, Chao Gou (2024)

[10] Challenger: Affordable adversarial driving video generation PDF

Xu, Zhiyuan, Li, Bohan, Gao, Huan-ang, Gao Ming-ju, Chen, Yong, Liu Ming, Yan Chenxu, Zhao Hang, Feng Shuo, Zhao Hao (2025)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

SafeMVDrive framework for multi-view safety-critical video generation

[6] Gaia-2: A controllable multi-view generative world model for autonomous driving PDF

Cannot Refute

[9] Drivinggen: Efficient safety-critical driving video generation with latent diffusion models PDF

Cannot Refute

[10] Challenger: Affordable adversarial driving video generation PDF

Cannot Refute

[24] LLM-based Realistic Safety-Critical Driving Video Generation PDF

Cannot Refute

[33] Woodscape: A multi-task, multi-camera fisheye dataset for autonomous driving PDF

Cannot Refute

[34] Og-gaussian: Occupancy based street gaussians for autonomous driving PDF

Cannot Refute

[35] Dreamforge: Motion-aware autoregressive video generation for multi-view driving scenes PDF

Cannot Refute

[36] Panacea: Panoramic and Controllable Video Generation for Autonomous Driving PDF

Cannot Refute

[37] Editable scene simulation for autonomous driving via collaborative llm-agents PDF

Cannot Refute

[38] Efficient Multi-Camera Tokenization With Triplanes for End-to-End Driving PDF

Cannot Refute

Contribution

VLM-based adversarial vehicle selector with visual context

[28] Disrupting Vision-Language Model-Driven Navigation Services via Adversarial Object Fusion PDF

Cannot Refute

[29] ADVEDM:Fine-grained Adversarial Attack against VLM-based Embodied Agents PDF

Cannot Refute

Contribution

Two-stage collision-evasion trajectory generator

[1] SafeMVDrive: Multi-view Safety-Critical Driving Video Synthesis in the Real World Domain PDF

Cannot Refute

[2] Driving into the future: Multiview visual forecasting and planning with world model for autonomous driving PDF

Cannot Refute

[30] Edge AI-driven Multi-camera System for Adaptive Robot Speed Control in Safety-critical Environments PDF

Cannot Refute

[31] DriveLaW: Unifying Planning and Video Generation in a Latent Driving World PDF

Cannot Refute

[32] Multi-Camera Vision-Based Mobile Robot Control and Path Planning PDF

Cannot Refute

SafeMVDrive: Multi-view Safety-Critical Driving Video Generation in the Real World Domain

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[1] SafeMVDrive: Multi-view Safety-Critical Driving Video Synthesis in the Real World Domain PDF

[9] Drivinggen: Efficient safety-critical driving video generation with latent diffusion models PDF

[10] Challenger: Affordable adversarial driving video generation PDF

Contribution Analysis

SafeMVDrive framework for multi-view safety-critical video generation

[6] Gaia-2: A controllable multi-view generative world model for autonomous driving PDF

[9] Drivinggen: Efficient safety-critical driving video generation with latent diffusion models PDF

[10] Challenger: Affordable adversarial driving video generation PDF

[24] LLM-based Realistic Safety-Critical Driving Video Generation PDF

[33] Woodscape: A multi-task, multi-camera fisheye dataset for autonomous driving PDF

[34] Og-gaussian: Occupancy based street gaussians for autonomous driving PDF

[35] Dreamforge: Motion-aware autoregressive video generation for multi-view driving scenes PDF

[36] Panacea: Panoramic and Controllable Video Generation for Autonomous Driving PDF

[37] Editable scene simulation for autonomous driving via collaborative llm-agents PDF

[38] Efficient Multi-Camera Tokenization With Triplanes for End-to-End Driving PDF

VLM-based adversarial vehicle selector with visual context

[28] Disrupting Vision-Language Model-Driven Navigation Services via Adversarial Object Fusion PDF

[29] ADVEDM:Fine-grained Adversarial Attack against VLM-based Embodied Agents PDF

Two-stage collision-evasion trajectory generator

[1] SafeMVDrive: Multi-view Safety-Critical Driving Video Synthesis in the Real World Domain PDF

[2] Driving into the future: Multiview visual forecasting and planning with world model for autonomous driving PDF

[30] Edge AI-driven Multi-camera System for Adaptive Robot Speed Control in Safety-critical Environments PDF

[31] DriveLaW: Unifying Planning and Video Generation in a Latent Driving World PDF

[32] Multi-Camera Vision-Based Mobile Robot Control and Path Planning PDF

Table of Contents