RAP: 3D Rasterization Augmented End-to-End Planning

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.7 Download Report PDF

Autonomous DrivingPlanningSim-to-Real

Imitation learning for end-to-end driving trains policies only on expert demonstrations. Once deployed in a closed loop, such policies lack recovery data: small mistakes cannot be corrected and quickly compound into failures. A promising direction is to generate alternative viewpoints and trajectories beyond the logged path. Prior work explores photorealistic digital twins via neural rendering or game engines, but these methods are prohibitively slow and costly, and thus mainly used for evaluation. In this work, we argue that photorealism is unnecessary for training end-to-end planners. What matters is semantic fidelity and scalability: driving depends on geometry and dynamics, not textures or lighting. Motivated by this, we propose 3D Rasterization, which replaces costly rendering with lightweight rasterization of annotated primitives, enabling augmentations such as counterfactual recovery maneuvers and cross-agent view synthesis. To transfer these synthetic views effectively to real-world deployment, we introduce a Raster-to-Real (R2R) feature-space alignment that bridges the sim-to-real gap at the representation level. Together, these components form the Rasterization Augmented Planning (RAP) pipeline, a scalable data augmentation framework for planning. RAP achieves state-of-the-art closed-loop robustness and long-tail generalization, ranking 1st on four major benchmarks: NAVSIM v1/v2, Waymo Open Dataset Vision-based E2E Driving, and Bench2Drive. Our results demonstrate that lightweight rasterization with feature alignment suffices to scale end-to-end training, offering a practical alternative to photorealistic rendering. Code will be released.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes a rasterization-based augmentation framework for end-to-end driving, replacing photorealistic rendering with lightweight primitive rasterization to generate counterfactual recovery maneuvers and cross-agent viewpoints. Within the taxonomy, it occupies the 'Rasterization and Primitive-Based Rendering' leaf under 'Synthetic Data Generation and Augmentation Methods'. This leaf currently contains only the original paper itself, with no sibling papers identified. This isolation suggests the rasterization-based approach represents a relatively sparse research direction compared to neighboring leaves like 'Diffusion-Based Video and Image Generation' (four papers) or 'Simulation-Based Data Generation' (two papers).

The taxonomy reveals a crowded landscape of synthetic augmentation methods. Adjacent leaves include diffusion-based synthesis (Synthetic Diffusion Driving, four papers), style transfer techniques (one paper), and simulation-based generation (two papers). The scope notes clarify boundaries: diffusion methods prioritize photorealism via generative models, while simulation-based approaches use physics engines or world models. The original paper explicitly diverges by arguing photorealism is unnecessary—semantic fidelity and scalability matter more. This positions the work as a computational efficiency alternative to heavier generative pipelines, though it shares the broader goal of expanding training data beyond logged expert demonstrations.

Among eighteen candidates examined across three contributions, none were found to clearly refute the proposed methods. The '3D Rasterization Pipeline' examined six candidates with zero refutations; 'Raster-to-Real Feature Alignment' examined ten with zero refutations; 'RAP Framework with Counterfactual Augmentation' examined two with zero refutations. This absence of overlapping prior work within the limited search scope suggests the specific combination of lightweight rasterization, feature-space sim-to-real alignment, and counterfactual augmentation strategies has not been directly addressed in the top-eighteen semantically similar papers. However, the search scale is modest and does not cover the full breadth of autonomous driving or graphics literature.

Based on the limited search scope, the work appears to introduce a distinct technical approach within a broader augmentation landscape. The taxonomy structure shows active research in diffusion-based and simulation-based generation, but the rasterization-based direction remains sparsely populated. The contribution-level statistics indicate no direct prior work among examined candidates, though this reflects top-eighteen semantic matches rather than exhaustive coverage. The novelty assessment is thus conditional on the search boundaries and may shift with deeper exploration of graphics-oriented or real-time rendering communities.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: end-to-end autonomous driving planning with data augmentation. The field addresses the challenge of training robust driving policies by combining neural architectures that map sensor inputs directly to control outputs with techniques that expand or enrich training data. The taxonomy reveals five main branches. Synthetic Data Generation and Augmentation Methods explores how to create or transform driving scenarios—ranging from diffusion-based scene synthesis (Synthetic Diffusion Driving[2]) to rasterization and primitive-based rendering (RAP Rasterization Planning[0])—to overcome data scarcity and improve generalization. End-to-End Learning Architectures and Training Frameworks focuses on neural network designs and optimization strategies that enable direct sensor-to-action mappings (Modularization Free Driving[1]). Imitation Learning and Behavioral Cloning investigates how to distill expert demonstrations into policies, often addressing distribution shift and covariate mismatch (Robust Behavioral Cloning[28], Imitation Data Augmentation[9]). Trajectory Selection and Multi-Modal Planning deals with generating and scoring multiple candidate trajectories to handle uncertainty and diverse driving behaviors (Multimodal Trajectory Scoring[19], DistillDrive Multimode[10]). Finally, Reinforcement Learning for Autonomous Driving examines reward-driven policy optimization, including safe exploration and sample efficiency (Expert Parking RL[15]). A particularly active line of work centers on how synthetic augmentation interacts with end-to-end architectures: some studies leverage diffusion models or style transfer (Style Transfer Augmentation[20]) to diversify training scenes, while others use lightweight rasterization to rapidly generate varied traffic configurations. RAP Rasterization Planning[0] sits within the rasterization and primitive-based rendering cluster, emphasizing efficient scene rendering to augment planning data. This contrasts with heavier generative approaches like Synthetic Diffusion Driving[2], which prioritizes photorealistic synthesis at higher computational cost, and with simulation-based methods (Robust Control Simulation[3]) that rely on physics engines. Meanwhile, imitation learning branches grapple with the trade-off between data quality and quantity: augmenting expert demonstrations can mitigate overfitting, but poorly designed augmentations risk introducing spurious correlations. Across these directions, open questions remain about the optimal balance between synthetic diversity and real-world fidelity, and how to integrate augmentation seamlessly into multi-modal planning frameworks that must reason over diverse candidate trajectories.

Claimed Contributions

3D Rasterization Pipeline for Driving Scene Reconstruction

6 retrieved papers

The authors introduce a lightweight, training-free rasterization method that converts annotated driving logs (lane polylines, agent cuboids) into perspective camera views. This approach prioritizes semantic and geometric fidelity over photorealism, enabling fast and controllable scene generation for end-to-end planning.

6 retrieved papers

Raster-to-Real (R2R) Feature-Space Alignment Module

10 retrieved papers

The authors propose a feature-space alignment technique that minimizes the domain gap between rasterized and real images at both spatial and global levels, using MSE loss and gradient reversal for domain confusion. This enables effective transfer from synthetic rasterized inputs to real-world deployment.

10 retrieved papers

RAP Framework with Counterfactual Augmentation Strategies

2 retrieved papers

The authors develop RAP, a complete data augmentation framework that combines 3D rasterization with recovery-oriented trajectory perturbations and cross-agent view synthesis. This framework addresses covariate shift in imitation learning by generating diverse training scenarios beyond logged expert demonstrations.

2 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

3D Rasterization Pipeline for Driving Scene Reconstruction

[33] Real-time neural rasterization for large scenes PDF

Cannot Refute

[34] Online map vectorization for autonomous driving: A rasterization perspective PDF

Cannot Refute

[35] Autosplat: Constrained gaussian splatting for autonomous driving scene reconstruction PDF

Cannot Refute

[36] EGSRAL: An Enhanced 3D Gaussian Splatting based Renderer with Automated Labeling for Large-Scale Driving Scene PDF

Cannot Refute

[37] 3D Vision-Language Gaussian Splatting PDF

Cannot Refute

[38] LSD-3D: Large-Scale 3D Driving Scene Generation with Geometry Grounding PDF

Cannot Refute

Contribution

Raster-to-Real (R2R) Feature-Space Alignment Module

[39] Improved distribution matching distillation for fast image synthesis PDF

Cannot Refute

[40] Industrial Image Anomaly Detection via Synthetic-Anomaly Contrastive Distillation PDF

Cannot Refute

[41] Dataset distillation via the wasserstein metric PDF

Cannot Refute

[42] Adversarial diffusion compression for real-world image super-resolution PDF

Cannot Refute

[43] Brain tumor segmentation using synthetic MR images - A comparison of GANs and diffusion models PDF

Cannot Refute

[44] Cad: Photorealistic 3d generation via adversarial distillation PDF

Cannot Refute

[45] Dual teacher knowledge distillation with domain alignment for face anti-spoofing PDF

Cannot Refute

[46] Align and distill: Unifying and improving domain adaptive object detection PDF

Cannot Refute

[47] Knowledge distillation-based domain generalization enabling invariant feature distributions for damage detection of rotating machines and structures PDF

Cannot Refute

[48] Cycada: Cycle-consistent adversarial domain adaptation PDF

Cannot Refute

Contribution

RAP Framework with Counterfactual Augmentation Strategies

[31] Multi-Agent Imitation Learning: Value is Easy, Regret is Hard PDF

Cannot Refute

[32] Towards better efficiency and generalization in imitation learning: a causal perspective PDF

Cannot Refute

RAP: 3D Rasterization Augmented End-to-End Planning

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

Contribution Analysis

3D Rasterization Pipeline for Driving Scene Reconstruction

[33] Real-time neural rasterization for large scenes PDF

[34] Online map vectorization for autonomous driving: A rasterization perspective PDF

[35] Autosplat: Constrained gaussian splatting for autonomous driving scene reconstruction PDF

[36] EGSRAL: An Enhanced 3D Gaussian Splatting based Renderer with Automated Labeling for Large-Scale Driving Scene PDF

[37] 3D Vision-Language Gaussian Splatting PDF

[38] LSD-3D: Large-Scale 3D Driving Scene Generation with Geometry Grounding PDF

Raster-to-Real (R2R) Feature-Space Alignment Module

[39] Improved distribution matching distillation for fast image synthesis PDF

[40] Industrial Image Anomaly Detection via Synthetic-Anomaly Contrastive Distillation PDF

[41] Dataset distillation via the wasserstein metric PDF

[42] Adversarial diffusion compression for real-world image super-resolution PDF

[43] Brain tumor segmentation using synthetic MR images - A comparison of GANs and diffusion models PDF

[44] Cad: Photorealistic 3d generation via adversarial distillation PDF

[45] Dual teacher knowledge distillation with domain alignment for face anti-spoofing PDF

[46] Align and distill: Unifying and improving domain adaptive object detection PDF

[47] Knowledge distillation-based domain generalization enabling invariant feature distributions for damage detection of rotating machines and structures PDF

[48] Cycada: Cycle-consistent adversarial domain adaptation PDF

RAP Framework with Counterfactual Augmentation Strategies

[31] Multi-Agent Imitation Learning: Value is Easy, Regret is Hard PDF

[32] Towards better efficiency and generalization in imitation learning: a causal perspective PDF

Table of Contents