Mean Flow Policy with Instantaneous Velocity Constraint for One-step Action Generation

ICLR 2026 Conference SubmissionAnonymous Authors
Reinforcement learningGenerative policy
Abstract:

Learning expressive and efficient policy functions is a promising direction in reinforcement learning (RL). While flow-based policies have recently proven effective in modeling complex action distributions with a fast deterministic sampling process, they still face a trade-off between expressiveness and computational burden, which is typically controlled by the number of flow steps. In this work, we propose mean flow policy (MFP), a new generative policy function that models the mean velocity field to achieve the fastest one-step action generation. To ensure its high expressiveness, an instantaneous velocity constraint (IVC) is introduced on the mean velocity field during training. We theoretically prove that this design explicitly serves as a crucial boundary condition, thereby improving learning accuracy and enhancing policy expressiveness. Empirically, our MFP achieves state-of-the-art success rates across several challenging robotic manipulation tasks from Robomimic and OGBench. It also delivers substantial improvements in training and inference speed over existing flow-based policy baselines.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes a mean flow policy (MFP) that learns a mean velocity field for one-step action generation in robotic manipulation. It resides in the 'Mean Velocity Field Approaches' leaf, which contains only two papers including this one. This is a relatively sparse research direction within the broader taxonomy of 28 papers across 15 leaf nodes, suggesting the specific formulation of mean velocity fields for direct one-step generation remains an emerging area compared to more crowded branches like consistency flow matching or acceleration techniques.

The taxonomy reveals several neighboring directions. Consistency Flow Matching (4 papers) and Rectified Flow Models (2 papers) pursue similar one-step or few-step generation goals but through different training objectives—consistency distillation or straight-trajectory rectification rather than mean velocity field learning. The Acceleration and Distillation branch (6 papers) addresses speed through post-hoc distillation or variance adaptation, while this work aims for inherent one-step efficiency. The scope notes clarify that mean velocity field methods exclude consistency-based reformulations, positioning MFP as a distinct approach within the flow matching paradigm.

Among 30 candidates examined, the core contribution of mean flow policy for one-step generation shows substantial prior work overlap: 5 of 10 candidates examined appear refutable, indicating existing methods pursue similar one-step generation goals. The instantaneous velocity constraint (IVC) as a boundary condition shows no clear refutation across 10 candidates, suggesting this theoretical contribution may be more novel. The performance claim examined 10 candidates with 2 refutable, implying competitive baselines exist but the specific speedup-accuracy trade-off may differ. The limited search scope means these statistics reflect top-30 semantic matches, not exhaustive coverage.

Based on the top-30 candidate analysis, the work appears to offer incremental refinement within an emerging subfield. The mean velocity field formulation has at least one sibling paper and multiple related approaches in neighboring leaves, while the IVC boundary condition lacks clear precedent in the examined literature. The sparse taxonomy leaf suggests room for methodological diversity, though the refutation statistics indicate the core one-step generation goal is not unprecedented.

Taxonomy

Core-task Taxonomy Papers
28
3
Claimed Contributions
30
Contribution Candidate Papers Compared
7
Refutable Paper

Research Landscape Overview

Core task: one-step action generation for robotic manipulation using flow-based policies. The field has coalesced around learning continuous normalizing flows that map noise distributions to action distributions, enabling fast inference without iterative denoising. The taxonomy reveals several complementary research directions. Flow Matching and Velocity Field Learning focuses on foundational training objectives—such as learning mean velocity fields or consistency-based formulations—that define how the flow evolves from a base distribution to the target policy. Acceleration and Distillation Techniques address inference speed by distilling multi-step flows into fewer steps or even single-step generators, often trading off some expressiveness for real-time performance. Multi-Step Flow Inference and Regularization explores structured sampling strategies and constraints that improve stability or sample quality when multiple flow steps are retained. Reinforcement Learning Integration examines how flow-based policies can be optimized via RL objectives, blending generative modeling with value-based or policy-gradient methods. Finally, Application-Specific Flow Policies tailors flow architectures to particular domains such as dexterous manipulation, bimanual coordination, or fabric handling. Recent work has intensified efforts to achieve true one-step generation while preserving multimodal expressiveness. Mean Flow Velocity[0] and MP1 MeanFlow[4] exemplify mean velocity field approaches that directly predict the average flow direction, enabling efficient single-step inference without sacrificing the ability to capture diverse action modes. In contrast, Flowpolicy Consistency[3] and Flow Single Step[5] pursue consistency distillation or rectified flow training to collapse multi-step trajectories into a single forward pass, often at the cost of additional training complexity or slight mode collapse risks. Meanwhile, methods like Maniflow[6] and FlowRAM[7] incorporate manifold constraints or memory-augmented architectures to handle high-dimensional or temporally extended action spaces. Mean Flow Velocity[0] sits squarely within the mean velocity field cluster, sharing the goal of direct one-step prediction with MP1 MeanFlow[4] but differing in how the mean field is regularized or conditioned on observations, offering a streamlined alternative to the iterative refinement seen in consistency-based approaches like Flowpolicy Consistency[3].

Claimed Contributions

Mean Flow Policy for one-step action generation

The authors introduce a novel flow-based policy that models the mean velocity field instead of instantaneous velocities. This design enables direct single-step mapping from Gaussian noise to multi-modal action distributions, eliminating the multi-step iterative sampling overhead of existing flow policies while preserving expressive power.

10 retrieved papers
Can Refute
Instantaneous Velocity Constraint as boundary condition

The authors propose IVC, a training objective that pairs average velocity loss with instantaneous velocity loss at interval start points. They theoretically prove this constraint acts as a necessary boundary condition for the mean flow ODE, eliminating solution multiplicity and improving learning accuracy with negligible computational overhead.

10 retrieved papers
State-of-the-art performance with substantial speedup

The authors demonstrate that their method achieves top success rates on Robomimic and OGBench benchmarks while delivering significant improvements in training and inference speed compared to multi-step flow policy baselines, highlighting practical applicability for real-time robotic control.

10 retrieved papers
Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Mean Flow Policy for one-step action generation

The authors introduce a novel flow-based policy that models the mean velocity field instead of instantaneous velocities. This design enables direct single-step mapping from Gaussian noise to multi-modal action distributions, eliminating the multi-step iterative sampling overhead of existing flow policies while preserving expressive power.

Contribution

Instantaneous Velocity Constraint as boundary condition

The authors propose IVC, a training objective that pairs average velocity loss with instantaneous velocity loss at interval start points. They theoretically prove this constraint acts as a necessary boundary condition for the mean flow ODE, eliminating solution multiplicity and improving learning accuracy with negligible computational overhead.

Contribution

State-of-the-art performance with substantial speedup

The authors demonstrate that their method achieves top success rates on Robomimic and OGBench benchmarks while delivering significant improvements in training and inference speed compared to multi-step flow policy baselines, highlighting practical applicability for real-time robotic control.

Mean Flow Policy with Instantaneous Velocity Constraint for One-step Action Generation | Novelty Validation