FlowAD: Ego-Scene Interactive Modeling for Autonomous Driving

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.5 Download Report PDF

Autonomous DrivingWorld ModelEnd-to-EndVision-Language-Action ModelScene Flow

Effective environment modeling is the foundation for autonomous driving, underpinning tasks from perception to planning. However, current paradigms often inadequately consider the feedback of ego motion to the observation, which leads to an incomplete understanding of the driving process and consequently limits the planning capability. To address this issue, we introduce a novel ego-scene interactive modeling paradigm. Inspired by human recognition, the paradigm represents ego-scene interaction as the scene flow relative to the ego-vehicle. This conceptualization allows for modeling ego-motion feedback within a feature learning pattern, advantageously utilizing existing log-replay datasets rather than relying on scenario simulations. We specifically propose FlowAD, a general flow-based framework for autonomous driving. Within it, an ego-guided scene partition first constructs basic flow units to quantify scene flow. The ego-vehicle's forward direction and steering velocity directly shape the partition, which reflects ego motion. Then, based on flow units, spatial and temporal flow predictions are performed to model dynamics of scene flow, encompassing both spatial displacement and temporal variation. The final task-aware enhancement exploits learned spatio-temporal flow dynamics to benefit diverse tasks through object and region-level strategies. We also propose a novel Frames before Correct Planning (FCP) metric to assess the scene understanding capability. Experiments in both open and closed-loop evaluations demonstrate FlowAD's generality and effectiveness across perception, end-to-end planning, and VLM analysis. Notably, FlowAD reduces 19% collision rate over SparseDrive with FCP improvements of 1.39 frames (60%) on nuScenes, and achieves an impressive driving score of 51.77 on Bench2Drive, proving the superiority. Code, model, and configurations will be released.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces FlowAD, a flow-based framework for autonomous driving that models ego-scene interaction through scene flow relative to the ego vehicle. Within the taxonomy, it occupies a unique leaf node under End-to-End Planning and Decision-Making labeled 'Ego-Scene Interactive Flow-Based Planning,' with no sibling papers in this specific category. This positioning suggests the work addresses a relatively sparse research direction, as the taxonomy contains 50 papers across approximately 36 topics, yet this particular formulation of ego-motion feedback through scene flow appears underexplored in the surveyed literature.

The taxonomy reveals that FlowAD sits within a broader End-to-End Planning branch containing eight subcategories, including Generative and Diffusion-Based Planning (e.g., DiffusionDrive), Deterministic and Reinforcement Learning-Based Planning, and Interaction-Aware and Graph-Based Planning (e.g., GraphAD). Neighboring branches include World Models and Generative Scene Simulation, which focuses on explicit future state prediction through occupancy grids or video generation, and Trajectory Prediction and Motion Forecasting, which emphasizes multi-agent dynamics without integrated planning. FlowAD's flow-based formulation distinguishes it from diffusion methods by offering tractable likelihood modeling, while its ego-guided scene partition diverges from graph-structured approaches by directly encoding ego motion into spatial decomposition.

Among 29 candidates examined across three contributions, no clearly refutable prior work was identified. The ego-scene interactive modeling paradigm examined 9 candidates with 0 refutations, the FlowAD framework examined 10 candidates with 0 refutations, and the Frames before Correct Planning metric examined 10 candidates with 0 refutations. This suggests that within the limited search scope—primarily top-K semantic matches and citation expansion—the specific combination of scene flow representation, ego-motion feedback modeling, and flow-based planning appears novel. However, the analysis explicitly notes this is not an exhaustive literature search, and the absence of refutations reflects the examined sample rather than comprehensive field coverage.

Given the limited search scope of 29 candidates, the work appears to introduce a distinctive approach within end-to-end planning by formalizing ego-scene interaction as relative scene flow and leveraging normalizing flows for probabilistic trajectory generation. The sparse population of its taxonomy leaf and the absence of refutable candidates among examined papers suggest potential novelty, though the analysis cannot rule out relevant prior work outside the semantic search radius or in adjacent subfields not fully captured by the taxonomy structure.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: ego-scene interactive modeling for autonomous driving. This field addresses how an autonomous vehicle perceives, predicts, and plans within dynamic traffic environments where the ego agent must continuously reason about interactions with other road users. The taxonomy organizes research into several major branches: World Models and Generative Scene Simulation focuses on learning predictive models of future scene evolution, often using generative architectures like diffusion or autoregressive frameworks (e.g., GAIA-1[2], GenAD[3], OccWorld[6]); End-to-End Planning and Decision-Making emphasizes direct mapping from perception to control, including flow-based and reinforcement learning approaches; Trajectory Prediction and Motion Forecasting targets anticipating the future paths of surrounding agents through joint or marginal modeling (e.g., GameFormer[7], Scene Transformer[35]); Perception and Scene Understanding covers representation learning and semantic reasoning; Simulation and Testing Environments provides platforms for closed-loop evaluation (e.g., Waymax[22]); Multi-Agent Systems and Communication explores coordination and information exchange; and Specialized Interaction Scenarios examines context-specific behaviors such as pedestrian crossings or merging maneuvers. A central tension across these branches is the trade-off between interpretability and end-to-end performance: world models offer explicit future rollouts and controllable generation but may struggle with real-time constraints, while end-to-end planners can be faster yet less transparent. Within the End-to-End Planning branch, FlowAD[0] adopts a flow-based formulation for interactive planning, distinguishing itself from diffusion-based methods like DiffusionDrive[4] by leveraging normalizing flows for tractable likelihood modeling and efficient sampling. This approach contrasts with graph-structured planners such as GraphAD[5], which explicitly encode spatial relationships, and with reinforcement learning pipelines that require extensive online interaction. FlowAD[0] thus occupies a niche emphasizing probabilistic modeling with closed-form density estimation, bridging generative scene simulation and direct planning in a computationally efficient manner while maintaining a degree of interpretability through its flow architecture.

Claimed Contributions

Ego-scene interactive modeling paradigm for autonomous driving

9 retrieved papers

The authors propose a new paradigm that models the feedback of ego motion to environmental observation by representing ego-scene interaction as scene flow relative to the ego-vehicle. This approach enables modeling ego-motion feedback within feature learning using log-replay datasets rather than requiring scenario simulations.

9 retrieved papers

FlowAD: a general flow-based framework for autonomous driving

10 retrieved papers

The authors introduce FlowAD, a framework comprising three core components: ego-guided scene partition that constructs flow units reflecting ego motion, spatial and temporal flow predictions that model scene flow dynamics, and task-aware enhancement that exploits learned dynamics to benefit diverse downstream tasks.

10 retrieved papers

Frames before Correct Planning (FCP) metric

10 retrieved papers

The authors introduce a new evaluation metric that quantifies the number of frames elapsed until a planner initiates a rational action in response to a given command, providing a statistical measure of a planner's comprehension of the driving process with ego-scene interaction.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Ego-scene interactive modeling paradigm for autonomous driving

[53] Using SceneâFlow to Improve Predictions of Road Users in Motion With Respect to an EgoâVehicle PDF

Cannot Refute

[56] SSF-MOS: Semantic Scene Flow Assisted Moving Object Segmentation for Autonomous Vehicles PDF

Cannot Refute

[60] Scene Flow Specifications: Encoding and Monitoring Rich Temporal Safety Properties of Autonomous Systems PDF

Cannot Refute

[61] SSF-PAN: Semantic Scene Flow-Based Perception for Autonomous Navigation in Traffic Scenarios PDF

Cannot Refute

[62] Gaussianad: Gaussian-centric end-to-end autonomous driving PDF

Cannot Refute

[63] Active scene flow estimation for autonomous driving via real-time scene prediction and optimal decision PDF

Cannot Refute

[64] PillarFlowNet: A Real-time Deep Multitask Network for LiDAR-based 3D Object Detection and Scene Flow Estimation PDF

Cannot Refute

[65] Neural Eulerian Scene Flow Fields PDF

Cannot Refute

[66] EgoFlowNet: Non-Rigid Scene Flow from Point Clouds with Ego-Motion Support PDF

Cannot Refute

Contribution

FlowAD: a general flow-based framework for autonomous driving

[24] Ego-centric Learning of Communicative World Models for Autonomous Driving PDF

Cannot Refute

[51] Motion inspired unsupervised perception and prediction in autonomous driving PDF

Cannot Refute

[52] Joint Scene Flow Estimation and Moving Object Segmentation on Rotational LiDAR Data PDF

Cannot Refute

[53] Using SceneâFlow to Improve Predictions of Road Users in Motion With Respect to an EgoâVehicle PDF

Cannot Refute

[54] Object scene flow for autonomous vehicles PDF

Cannot Refute

[55] 20.6 LSPU: A Fully Integrated Real-Time LiDAR-SLAM SoC with Point-Neural-Network Segmentation and Multi-Level kNN Acceleration PDF

Cannot Refute

[56] SSF-MOS: Semantic Scene Flow Assisted Moving Object Segmentation for Autonomous Vehicles PDF

Cannot Refute

[57] DiFSD: Ego-Centric Fully Sparse Paradigm with Uncertainty Denoising and Iterative Refinement for Efficient End-to-End Self-Driving PDF

Cannot Refute

[58] Scalable scene flow from point clouds in the real world PDF

Cannot Refute

[59] Towards optical flow ego-motion compensation for moving object segmentation PDF

Cannot Refute

Contribution

Frames before Correct Planning (FCP) metric

[67] A real-time motion planner with trajectory optimization for autonomous vehicles PDF

Cannot Refute

[68] Continuous advantage learning for minimum-time trajectory planning of autonomous vehicles PDF

Cannot Refute

[69] Autonomous Reactive Masonry Construction using Collaborative Heterogeneous Aerial Robots with Experimental Demonstration PDF

Cannot Refute

[70] Enhancing unmanned vehicle navigation safety: real-time visual mapping with CNNs with optimized Bezier trajectory smoothing PDF

Cannot Refute

[71] An efficient approach of time-optimal trajectory generation for the fully autonomous navigation of the quadrotor PDF

Cannot Refute

[72] Time-based trajectory control for an urban autonomous vehicle PDF

Cannot Refute

[73] An Approach to Generating Scenarios for Autonomous Vehicles PDF

Cannot Refute

[74] Time-Optimal Trajectory Planning and Tracking for Autonomous Vehicles PDF

Cannot Refute

[75] Predefined Time Trajectory Tracking Control of Underactuated Autonomous Underwater Vehicle* PDF

Cannot Refute

[76] Omnidirectional Autonomous Aggressive Perching of Unmanned Aerial Vehicle using Reinforcement Learning Trajectory Generation and Control PDF

Cannot Refute

FlowAD: Ego-Scene Interactive Modeling for Autonomous Driving

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

Contribution Analysis

Ego-scene interactive modeling paradigm for autonomous driving

[53] Using SceneâFlow to Improve Predictions of Road Users in Motion With Respect to an EgoâVehicle PDF

[56] SSF-MOS: Semantic Scene Flow Assisted Moving Object Segmentation for Autonomous Vehicles PDF

[60] Scene Flow Specifications: Encoding and Monitoring Rich Temporal Safety Properties of Autonomous Systems PDF

[61] SSF-PAN: Semantic Scene Flow-Based Perception for Autonomous Navigation in Traffic Scenarios PDF

[62] Gaussianad: Gaussian-centric end-to-end autonomous driving PDF

[63] Active scene flow estimation for autonomous driving via real-time scene prediction and optimal decision PDF

[64] PillarFlowNet: A Real-time Deep Multitask Network for LiDAR-based 3D Object Detection and Scene Flow Estimation PDF

[65] Neural Eulerian Scene Flow Fields PDF

[66] EgoFlowNet: Non-Rigid Scene Flow from Point Clouds with Ego-Motion Support PDF

FlowAD: a general flow-based framework for autonomous driving

[24] Ego-centric Learning of Communicative World Models for Autonomous Driving PDF

[51] Motion inspired unsupervised perception and prediction in autonomous driving PDF

[52] Joint Scene Flow Estimation and Moving Object Segmentation on Rotational LiDAR Data PDF

[53] Using SceneâFlow to Improve Predictions of Road Users in Motion With Respect to an EgoâVehicle PDF

[54] Object scene flow for autonomous vehicles PDF

[55] 20.6 LSPU: A Fully Integrated Real-Time LiDAR-SLAM SoC with Point-Neural-Network Segmentation and Multi-Level kNN Acceleration PDF

[56] SSF-MOS: Semantic Scene Flow Assisted Moving Object Segmentation for Autonomous Vehicles PDF

[57] DiFSD: Ego-Centric Fully Sparse Paradigm with Uncertainty Denoising and Iterative Refinement for Efficient End-to-End Self-Driving PDF

[58] Scalable scene flow from point clouds in the real world PDF

[59] Towards optical flow ego-motion compensation for moving object segmentation PDF

Frames before Correct Planning (FCP) metric

[67] A real-time motion planner with trajectory optimization for autonomous vehicles PDF

[68] Continuous advantage learning for minimum-time trajectory planning of autonomous vehicles PDF

[69] Autonomous Reactive Masonry Construction using Collaborative Heterogeneous Aerial Robots with Experimental Demonstration PDF

[70] Enhancing unmanned vehicle navigation safety: real-time visual mapping with CNNs with optimized Bezier trajectory smoothing PDF

[71] An efficient approach of time-optimal trajectory generation for the fully autonomous navigation of the quadrotor PDF

[72] Time-based trajectory control for an urban autonomous vehicle PDF

[73] An Approach to Generating Scenarios for Autonomous Vehicles PDF

[74] Time-Optimal Trajectory Planning and Tracking for Autonomous Vehicles PDF

[75] Predefined Time Trajectory Tracking Control of Underactuated Autonomous Underwater Vehicle* PDF

[76] Omnidirectional Autonomous Aggressive Perching of Unmanned Aerial Vehicle using Reinforcement Learning Trajectory Generation and Control PDF

Table of Contents

[53] Using SceneâFlow to Improve Predictions of Road Users in Motion With Respect to an EgoâVehicle PDF

[53] Using SceneâFlow to Improve Predictions of Road Users in Motion With Respect to an EgoâVehicle PDF