Mean Flow Policy with Instantaneous Velocity Constraint for One-step Action Generation

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 7.0 Download Report PDF

Reinforcement learningGenerative policy

Learning expressive and efficient policy functions is a promising direction in reinforcement learning (RL). While flow-based policies have recently proven effective in modeling complex action distributions with a fast deterministic sampling process, they still face a trade-off between expressiveness and computational burden, which is typically controlled by the number of flow steps. In this work, we propose mean flow policy (MFP), a new generative policy function that models the mean velocity field to achieve the fastest one-step action generation. To ensure its high expressiveness, an instantaneous velocity constraint (IVC) is introduced on the mean velocity field during training. We theoretically prove that this design explicitly serves as a crucial boundary condition, thereby improving learning accuracy and enhancing policy expressiveness. Empirically, our MFP achieves state-of-the-art success rates across several challenging robotic manipulation tasks from Robomimic and OGBench. It also delivers substantial improvements in training and inference speed over existing flow-based policy baselines.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes a mean flow policy (MFP) that learns a mean velocity field for one-step action generation in robotic manipulation. It resides in the 'Mean Velocity Field Approaches' leaf, which contains only two papers including this one. This is a relatively sparse research direction within the broader taxonomy of 28 papers across 15 leaf nodes, suggesting the specific formulation of mean velocity fields for direct one-step generation remains an emerging area compared to more crowded branches like consistency flow matching or acceleration techniques.

The taxonomy reveals several neighboring directions. Consistency Flow Matching (4 papers) and Rectified Flow Models (2 papers) pursue similar one-step or few-step generation goals but through different training objectives—consistency distillation or straight-trajectory rectification rather than mean velocity field learning. The Acceleration and Distillation branch (6 papers) addresses speed through post-hoc distillation or variance adaptation, while this work aims for inherent one-step efficiency. The scope notes clarify that mean velocity field methods exclude consistency-based reformulations, positioning MFP as a distinct approach within the flow matching paradigm.

Among 30 candidates examined, the core contribution of mean flow policy for one-step generation shows substantial prior work overlap: 5 of 10 candidates examined appear refutable, indicating existing methods pursue similar one-step generation goals. The instantaneous velocity constraint (IVC) as a boundary condition shows no clear refutation across 10 candidates, suggesting this theoretical contribution may be more novel. The performance claim examined 10 candidates with 2 refutable, implying competitive baselines exist but the specific speedup-accuracy trade-off may differ. The limited search scope means these statistics reflect top-30 semantic matches, not exhaustive coverage.

Based on the top-30 candidate analysis, the work appears to offer incremental refinement within an emerging subfield. The mean velocity field formulation has at least one sibling paper and multiple related approaches in neighboring leaves, while the IVC boundary condition lacks clear precedent in the examined literature. The sparse taxonomy leaf suggests room for methodological diversity, though the refutation statistics indicate the core one-step generation goal is not unprecedented.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: one-step action generation for robotic manipulation using flow-based policies. The field has coalesced around learning continuous normalizing flows that map noise distributions to action distributions, enabling fast inference without iterative denoising. The taxonomy reveals several complementary research directions. Flow Matching and Velocity Field Learning focuses on foundational training objectives—such as learning mean velocity fields or consistency-based formulations—that define how the flow evolves from a base distribution to the target policy. Acceleration and Distillation Techniques address inference speed by distilling multi-step flows into fewer steps or even single-step generators, often trading off some expressiveness for real-time performance. Multi-Step Flow Inference and Regularization explores structured sampling strategies and constraints that improve stability or sample quality when multiple flow steps are retained. Reinforcement Learning Integration examines how flow-based policies can be optimized via RL objectives, blending generative modeling with value-based or policy-gradient methods. Finally, Application-Specific Flow Policies tailors flow architectures to particular domains such as dexterous manipulation, bimanual coordination, or fabric handling. Recent work has intensified efforts to achieve true one-step generation while preserving multimodal expressiveness. Mean Flow Velocity[0] and MP1 MeanFlow[4] exemplify mean velocity field approaches that directly predict the average flow direction, enabling efficient single-step inference without sacrificing the ability to capture diverse action modes. In contrast, Flowpolicy Consistency[3] and Flow Single Step[5] pursue consistency distillation or rectified flow training to collapse multi-step trajectories into a single forward pass, often at the cost of additional training complexity or slight mode collapse risks. Meanwhile, methods like Maniflow[6] and FlowRAM[7] incorporate manifold constraints or memory-augmented architectures to handle high-dimensional or temporally extended action spaces. Mean Flow Velocity[0] sits squarely within the mean velocity field cluster, sharing the goal of direct one-step prediction with MP1 MeanFlow[4] but differing in how the mean field is regularized or conditioned on observations, offering a streamlined alternative to the iterative refinement seen in consistency-based approaches like Flowpolicy Consistency[3].

Claimed Contributions

Mean Flow Policy for one-step action generation

Can Refute

10 retrieved papers

The authors introduce a novel flow-based policy that models the mean velocity field instead of instantaneous velocities. This design enables direct single-step mapping from Gaussian noise to multi-modal action distributions, eliminating the multi-step iterative sampling overhead of existing flow policies while preserving expressive power.

10 retrieved papers

Can Refute

Instantaneous Velocity Constraint as boundary condition

10 retrieved papers

The authors propose IVC, a training objective that pairs average velocity loss with instantaneous velocity loss at interval start points. They theoretically prove this constraint acts as a necessary boundary condition for the mean flow ODE, eliminating solution multiplicity and improving learning accuracy with negligible computational overhead.

10 retrieved papers

State-of-the-art performance with substantial speedup

Can Refute

10 retrieved papers

The authors demonstrate that their method achieves top success rates on Robomimic and OGBench benchmarks while delivering significant improvements in training and inference speed compared to multi-step flow policy baselines, highlighting practical applicability for real-time robotic control.

10 retrieved papers

Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[4] MP1: MeanFlow Tames Policy Learning in 1-step for Robotic Manipulation PDF

Wang, Ziyi, Juyi Sheng, Li Peiming, Ziyi Wang, Liu, Mengyuan, Peiming Li, Mengyuan Liu (2025)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Mean Flow Policy for one-step action generation

[4] MP1: MeanFlow Tames Policy Learning in 1-step for Robotic Manipulation PDF

Can Refute

[18] Revisiting Diffusion Q-Learning: From Iterative Denoising to One-Step Action Generation PDF

Can Refute

[21] One-Step Generative Policies with Q-Learning: A Reformulation of MeanFlow PDF

Can Refute

[43] Mean Flows for One-step Generative Modeling PDF

Can Refute

[44] OM2P: Offline multi-agent mean-flow policy PDF

Can Refute

[3] Flowpolicy: Enabling fast and robust 3d flow-based policy via consistency flow matching for robot manipulation PDF

Cannot Refute

[45] Time-Unified Diffusion Policy with Action Discrimination for Robotic Manipulation PDF

Cannot Refute

[46] Towards High-Order Mean Flow Generative Models: Feasibility, Expressivity, and Provably Efficient Criteria PDF

Cannot Refute

[47] Improved Mean Flows: On the Challenges of Fastforward Generative Models PDF

Cannot Refute

[48] One-Step Generative Channel Estimation via Average Velocity Field PDF

Cannot Refute

Contribution

Instantaneous Velocity Constraint as boundary condition

[3] Flowpolicy: Enabling fast and robust 3d flow-based policy via consistency flow matching for robot manipulation PDF

Cannot Refute

[34] SAC Flow: Sample-Efficient Reinforcement Learning of Flow-Based Policies via Velocity-Reparameterized Sequential Modeling PDF

Cannot Refute

[35] Hidden fluid mechanics: Learning velocity and pressure fields from flow visualizations PDF

Cannot Refute

[36] Uncertainty-aware constraint inference in inverse constrained reinforcement learning PDF

Cannot Refute

[37] A numerical study of fish adaption behaviors in complex environments with a deep reinforcement learning and immersed boundaryâlattice Boltzmann method PDF

Cannot Refute

[38] Control-barrier-function-based design of gradient flows for constrained nonlinear programming PDF

Cannot Refute

[39] SpecGuard: Specification aware recovery for robotic autonomous vehicles from physical attacks PDF

Cannot Refute

[40] Flow-Based Policy for Online Reinforcement Learning PDF

Cannot Refute

[41] A dynamic programming approach for optimizing train speed profiles with speed restrictions and passage points PDF

Cannot Refute

[42] Achieving Safe Control Online through Integration of Harmonic Control Lyapunov-Barrier Functions with Unsafe Object-Centric Action Policies PDF

Cannot Refute

Contribution

State-of-the-art performance with substantial speedup

[3] Flowpolicy: Enabling fast and robust 3d flow-based policy via consistency flow matching for robot manipulation PDF

Can Refute

[4] MP1: MeanFlow Tames Policy Learning in 1-step for Robotic Manipulation PDF

Can Refute

[7] FlowRAM: Grounding Flow Matching Policy with Region-Aware Mamba Framework for Robotic Manipulation PDF

Cannot Refute

[8] 3D FlowMatch Actor: Unified 3D Policy for Single- and Dual-Arm Manipulation PDF

Cannot Refute

[12] DM1: MeanFlow with Dispersive Regularization for 1-Step Robotic Manipulation PDF

Cannot Refute

[29] : A Vision-Language-Action Flow Model for General Robot Control PDF

Cannot Refute

[30] Vfp: Variational flow-matching policy for multi-modal robot manipulation PDF

Cannot Refute

[31] Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation PDF

Cannot Refute

[32] General Flow as Foundation Affordance for Scalable Robot Learning PDF

Cannot Refute

[33] Can we detect failures without failure data? uncertainty-aware runtime failure detection for imitation learning policies PDF

Cannot Refute

Mean Flow Policy with Instantaneous Velocity Constraint for One-step Action Generation

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[4] MP1: MeanFlow Tames Policy Learning in 1-step for Robotic Manipulation PDF

Contribution Analysis

Mean Flow Policy for one-step action generation

[4] MP1: MeanFlow Tames Policy Learning in 1-step for Robotic Manipulation PDF

[18] Revisiting Diffusion Q-Learning: From Iterative Denoising to One-Step Action Generation PDF

[21] One-Step Generative Policies with Q-Learning: A Reformulation of MeanFlow PDF

[43] Mean Flows for One-step Generative Modeling PDF

[44] OM2P: Offline multi-agent mean-flow policy PDF

[3] Flowpolicy: Enabling fast and robust 3d flow-based policy via consistency flow matching for robot manipulation PDF

[45] Time-Unified Diffusion Policy with Action Discrimination for Robotic Manipulation PDF

[46] Towards High-Order Mean Flow Generative Models: Feasibility, Expressivity, and Provably Efficient Criteria PDF

[47] Improved Mean Flows: On the Challenges of Fastforward Generative Models PDF

[48] One-Step Generative Channel Estimation via Average Velocity Field PDF

Instantaneous Velocity Constraint as boundary condition

[3] Flowpolicy: Enabling fast and robust 3d flow-based policy via consistency flow matching for robot manipulation PDF

[34] SAC Flow: Sample-Efficient Reinforcement Learning of Flow-Based Policies via Velocity-Reparameterized Sequential Modeling PDF

[35] Hidden fluid mechanics: Learning velocity and pressure fields from flow visualizations PDF

[36] Uncertainty-aware constraint inference in inverse constrained reinforcement learning PDF

[37] A numerical study of fish adaption behaviors in complex environments with a deep reinforcement learning and immersed boundaryâlattice Boltzmann method PDF

[38] Control-barrier-function-based design of gradient flows for constrained nonlinear programming PDF

[39] SpecGuard: Specification aware recovery for robotic autonomous vehicles from physical attacks PDF

[40] Flow-Based Policy for Online Reinforcement Learning PDF

[41] A dynamic programming approach for optimizing train speed profiles with speed restrictions and passage points PDF

[42] Achieving Safe Control Online through Integration of Harmonic Control Lyapunov-Barrier Functions with Unsafe Object-Centric Action Policies PDF

State-of-the-art performance with substantial speedup

[3] Flowpolicy: Enabling fast and robust 3d flow-based policy via consistency flow matching for robot manipulation PDF

[4] MP1: MeanFlow Tames Policy Learning in 1-step for Robotic Manipulation PDF

[7] FlowRAM: Grounding Flow Matching Policy with Region-Aware Mamba Framework for Robotic Manipulation PDF

[8] 3D FlowMatch Actor: Unified 3D Policy for Single- and Dual-Arm Manipulation PDF

[12] DM1: MeanFlow with Dispersive Regularization for 1-Step Robotic Manipulation PDF

[29] : A Vision-Language-Action Flow Model for General Robot Control PDF

[30] Vfp: Variational flow-matching policy for multi-modal robot manipulation PDF

[31] Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation PDF

[32] General Flow as Foundation Affordance for Scalable Robot Learning PDF

[33] Can we detect failures without failure data? uncertainty-aware runtime failure detection for imitation learning policies PDF

Table of Contents

[37] A numerical study of fish adaption behaviors in complex environments with a deep reinforcement learning and immersed boundaryâlattice Boltzmann method PDF