FlowAD: Ego-Scene Interactive Modeling for Autonomous Driving

ICLR 2026 Conference SubmissionAnonymous Authors
Autonomous DrivingWorld ModelEnd-to-EndVision-Language-Action ModelScene Flow
Abstract:

Effective environment modeling is the foundation for autonomous driving, underpinning tasks from perception to planning. However, current paradigms often inadequately consider the feedback of ego motion to the observation, which leads to an incomplete understanding of the driving process and consequently limits the planning capability. To address this issue, we introduce a novel ego-scene interactive modeling paradigm. Inspired by human recognition, the paradigm represents ego-scene interaction as the scene flow relative to the ego-vehicle. This conceptualization allows for modeling ego-motion feedback within a feature learning pattern, advantageously utilizing existing log-replay datasets rather than relying on scenario simulations. We specifically propose FlowAD, a general flow-based framework for autonomous driving. Within it, an ego-guided scene partition first constructs basic flow units to quantify scene flow. The ego-vehicle's forward direction and steering velocity directly shape the partition, which reflects ego motion. Then, based on flow units, spatial and temporal flow predictions are performed to model dynamics of scene flow, encompassing both spatial displacement and temporal variation. The final task-aware enhancement exploits learned spatio-temporal flow dynamics to benefit diverse tasks through object and region-level strategies. We also propose a novel Frames before Correct Planning (FCP) metric to assess the scene understanding capability. Experiments in both open and closed-loop evaluations demonstrate FlowAD's generality and effectiveness across perception, end-to-end planning, and VLM analysis. Notably, FlowAD reduces 19% collision rate over SparseDrive with FCP improvements of 1.39 frames (60%) on nuScenes, and achieves an impressive driving score of 51.77 on Bench2Drive, proving the superiority. Code, model, and configurations will be released.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces FlowAD, a flow-based framework for autonomous driving that models ego-scene interaction through scene flow relative to the ego vehicle. Within the taxonomy, it occupies a unique leaf node under End-to-End Planning and Decision-Making labeled 'Ego-Scene Interactive Flow-Based Planning,' with no sibling papers in this specific category. This positioning suggests the work addresses a relatively sparse research direction, as the taxonomy contains 50 papers across approximately 36 topics, yet this particular formulation of ego-motion feedback through scene flow appears underexplored in the surveyed literature.

The taxonomy reveals that FlowAD sits within a broader End-to-End Planning branch containing eight subcategories, including Generative and Diffusion-Based Planning (e.g., DiffusionDrive), Deterministic and Reinforcement Learning-Based Planning, and Interaction-Aware and Graph-Based Planning (e.g., GraphAD). Neighboring branches include World Models and Generative Scene Simulation, which focuses on explicit future state prediction through occupancy grids or video generation, and Trajectory Prediction and Motion Forecasting, which emphasizes multi-agent dynamics without integrated planning. FlowAD's flow-based formulation distinguishes it from diffusion methods by offering tractable likelihood modeling, while its ego-guided scene partition diverges from graph-structured approaches by directly encoding ego motion into spatial decomposition.

Among 29 candidates examined across three contributions, no clearly refutable prior work was identified. The ego-scene interactive modeling paradigm examined 9 candidates with 0 refutations, the FlowAD framework examined 10 candidates with 0 refutations, and the Frames before Correct Planning metric examined 10 candidates with 0 refutations. This suggests that within the limited search scope—primarily top-K semantic matches and citation expansion—the specific combination of scene flow representation, ego-motion feedback modeling, and flow-based planning appears novel. However, the analysis explicitly notes this is not an exhaustive literature search, and the absence of refutations reflects the examined sample rather than comprehensive field coverage.

Given the limited search scope of 29 candidates, the work appears to introduce a distinctive approach within end-to-end planning by formalizing ego-scene interaction as relative scene flow and leveraging normalizing flows for probabilistic trajectory generation. The sparse population of its taxonomy leaf and the absence of refutable candidates among examined papers suggest potential novelty, though the analysis cannot rule out relevant prior work outside the semantic search radius or in adjacent subfields not fully captured by the taxonomy structure.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
29
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: ego-scene interactive modeling for autonomous driving. This field addresses how an autonomous vehicle perceives, predicts, and plans within dynamic traffic environments where the ego agent must continuously reason about interactions with other road users. The taxonomy organizes research into several major branches: World Models and Generative Scene Simulation focuses on learning predictive models of future scene evolution, often using generative architectures like diffusion or autoregressive frameworks (e.g., GAIA-1[2], GenAD[3], OccWorld[6]); End-to-End Planning and Decision-Making emphasizes direct mapping from perception to control, including flow-based and reinforcement learning approaches; Trajectory Prediction and Motion Forecasting targets anticipating the future paths of surrounding agents through joint or marginal modeling (e.g., GameFormer[7], Scene Transformer[35]); Perception and Scene Understanding covers representation learning and semantic reasoning; Simulation and Testing Environments provides platforms for closed-loop evaluation (e.g., Waymax[22]); Multi-Agent Systems and Communication explores coordination and information exchange; and Specialized Interaction Scenarios examines context-specific behaviors such as pedestrian crossings or merging maneuvers. A central tension across these branches is the trade-off between interpretability and end-to-end performance: world models offer explicit future rollouts and controllable generation but may struggle with real-time constraints, while end-to-end planners can be faster yet less transparent. Within the End-to-End Planning branch, FlowAD[0] adopts a flow-based formulation for interactive planning, distinguishing itself from diffusion-based methods like DiffusionDrive[4] by leveraging normalizing flows for tractable likelihood modeling and efficient sampling. This approach contrasts with graph-structured planners such as GraphAD[5], which explicitly encode spatial relationships, and with reinforcement learning pipelines that require extensive online interaction. FlowAD[0] thus occupies a niche emphasizing probabilistic modeling with closed-form density estimation, bridging generative scene simulation and direct planning in a computationally efficient manner while maintaining a degree of interpretability through its flow architecture.

Claimed Contributions

Ego-scene interactive modeling paradigm for autonomous driving

The authors propose a new paradigm that models the feedback of ego motion to environmental observation by representing ego-scene interaction as scene flow relative to the ego-vehicle. This approach enables modeling ego-motion feedback within feature learning using log-replay datasets rather than requiring scenario simulations.

9 retrieved papers
FlowAD: a general flow-based framework for autonomous driving

The authors introduce FlowAD, a framework comprising three core components: ego-guided scene partition that constructs flow units reflecting ego motion, spatial and temporal flow predictions that model scene flow dynamics, and task-aware enhancement that exploits learned dynamics to benefit diverse downstream tasks.

10 retrieved papers
Frames before Correct Planning (FCP) metric

The authors introduce a new evaluation metric that quantifies the number of frames elapsed until a planner initiates a rational action in response to a given command, providing a statistical measure of a planner's comprehension of the driving process with ego-scene interaction.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Ego-scene interactive modeling paradigm for autonomous driving

The authors propose a new paradigm that models the feedback of ego motion to environmental observation by representing ego-scene interaction as scene flow relative to the ego-vehicle. This approach enables modeling ego-motion feedback within feature learning using log-replay datasets rather than requiring scenario simulations.

Contribution

FlowAD: a general flow-based framework for autonomous driving

The authors introduce FlowAD, a framework comprising three core components: ego-guided scene partition that constructs flow units reflecting ego motion, spatial and temporal flow predictions that model scene flow dynamics, and task-aware enhancement that exploits learned dynamics to benefit diverse downstream tasks.

Contribution

Frames before Correct Planning (FCP) metric

The authors introduce a new evaluation metric that quantifies the number of frames elapsed until a planner initiates a rational action in response to a given command, providing a statistical measure of a planner's comprehension of the driving process with ego-scene interaction.