DrivingGen: A Comprehensive Benchmark for Generative Video World Models in Autonomous Driving
Overview
Overall Novelty Assessment
The paper proposes DrivingGen, a comprehensive benchmark for evaluating generative driving world models across multiple dimensions including visual quality, trajectory plausibility, temporal consistency, and controllability. Within the taxonomy, it resides in the 'Comprehensive Benchmarking Frameworks' leaf under 'Evaluation Frameworks and Benchmarking'. This leaf contains only two papers total (including DrivingGen), indicating a relatively sparse research direction. The sibling paper is WorldSimBench, suggesting that holistic, multi-dimensional evaluation frameworks for driving world models represent an emerging but not yet crowded area of investigation.
The taxonomy reveals that most research activity concentrates on model architectures and generation mechanisms, with substantial work in diffusion-based models (11 papers across three sub-leaves) and data generation for downstream tasks (10 papers across four sub-leaves). The evaluation branch sits somewhat apart from these technical development efforts. Neighboring leaves include 'Survey and Taxonomic Reviews' (4 papers) which provide broader field overviews, and the various architecture categories which propose the models that benchmarks like DrivingGen aim to assess. The scope_note for this leaf explicitly excludes papers proposing models without comprehensive evaluation frameworks, clarifying that DrivingGen's focus on systematic assessment distinguishes it from generation-focused work.
Among the three contributions analyzed, the benchmark dataset contribution examined 10 candidates and found 1 potentially refutable prior work, suggesting some overlap with existing evaluation datasets. The novel metrics contribution examined 10 candidates with none clearly refuting it, indicating this aspect may be more distinctive. The comprehensive model evaluation contribution examined only 2 candidates with no refutations found. Given the limited search scope of 22 total candidates examined, these statistics suggest moderate novelty for the metrics and evaluation methodology, while the dataset contribution faces more substantial prior work within the examined literature.
Based on the limited top-22 semantic search results, DrivingGen appears to occupy a relatively underexplored niche focused on holistic benchmarking rather than model development. The sparse population of its taxonomy leaf and the moderate refutation rates suggest the work addresses a recognized gap, though the small candidate pool means potentially relevant evaluation frameworks outside the search scope remain unexamined. The analysis captures the paper's positioning within known benchmarking efforts but cannot assess novelty against the broader evaluation literature.
Taxonomy
Research Landscape Overview
Claimed Contributions
The authors introduce DrivingGen, a comprehensive benchmark that includes a carefully curated evaluation dataset covering diverse driving conditions such as varied weather (rain, snow, fog), times of day (dawn, day, night), global geographic regions, and complex driving maneuvers. This dataset addresses the limited diversity in existing benchmarks like nuScenes and OpenDV.
The authors propose a novel suite of evaluation metrics specifically designed for driving scenarios. These metrics comprehensively evaluate four dimensions: distribution-level measures for videos and trajectories, quality metrics accounting for perceptual and driving-specific factors, temporal consistency at scene and agent levels, and trajectory alignment measuring controllability.
The authors conduct extensive benchmarking of 14 generative world models spanning general video models, physics-based models, and driving-specific models. This evaluation reveals important insights about trade-offs between visual quality and physical consistency, providing the first comprehensive comparison in the driving domain.
Core Task Comparisons
Comparisons with papers in the same taxonomy category
[37] Worldsimbench: Towards video generation models as world simulators PDF
Contribution Analysis
Detailed comparisons for each claimed contribution
DrivingGen benchmark with diverse evaluation dataset
The authors introduce DrivingGen, a comprehensive benchmark that includes a carefully curated evaluation dataset covering diverse driving conditions such as varied weather (rain, snow, fog), times of day (dawn, day, night), global geographic regions, and complex driving maneuvers. This dataset addresses the limited diversity in existing benchmarks like nuScenes and OpenDV.
[63] Bdd100k: A diverse driving dataset for heterogeneous multitask learning PDF
[14] Generalized predictive model for autonomous driving PDF
[59] SID: Stereo Image Dataset for Autonomous Driving in Adverse Conditions PDF
[60] S2R-Bench: A Sim-to-Real Evaluation Benchmark for Autonomous Driving PDF
[61] A survey on autonomous driving datasets: Statistics, annotation quality, and a future outlook PDF
[62] Towards a Transitional Weather Scene Recognition Approach for Autonomous Vehicles PDF
[64] Augmented Cross Layer Refinement Network-Based Lane Detection in Adverse Weather Conditions PDF
[65] One million scenes for autonomous driving: Once dataset PDF
[66] ACDC: The adverse conditions dataset with correspondences for semantic driving scene understanding PDF
[67] Ithaca365: Dataset and driving perception under repeated and challenging weather conditions PDF
Novel multifaceted metrics for driving world models
The authors propose a novel suite of evaluation metrics specifically designed for driving scenarios. These metrics comprehensively evaluate four dimensions: distribution-level measures for videos and trajectories, quality metrics accounting for perceptual and driving-specific factors, temporal consistency at scene and agent levels, and trajectory alignment measuring controllability.
[4] Drivedreamer: Towards real-world-drive world models for autonomous driving PDF
[37] Worldsimbench: Towards video generation models as world simulators PDF
[51] Probing multimodal llms as world models for driving PDF
[52] Gigaworld-0: World models as data engine to empower embodied ai PDF
[53] Lidardm: Generative lidar simulation in a generated world PDF
[54] Geodrive: 3d geometry-informed driving world model with precise action control PDF
[55] Panacea: Panoramic and controllable video generation for autonomous driving PDF
[56] World-in-world: World models in a closed-loop world PDF
[57] Act-bench: Towards action controllable world models for autonomous driving PDF
[58] Seeing Clearly, Forgetting Deeply: Revisiting Fine-Tuned Video Generators for Driving Simulation PDF
Comprehensive evaluation of 14 state-of-the-art models
The authors conduct extensive benchmarking of 14 generative world models spanning general video models, physics-based models, and driving-specific models. This evaluation reveals important insights about trade-offs between visual quality and physical consistency, providing the first comprehensive comparison in the driving domain.