Much Ado About Noising: Do Flow Models Actually Make Better Control Policies?

ICLR 2026 Conference SubmissionAnonymous Authors
Generative modelFlowControlBehavior cloning
Abstract:

Generative models, like flows and diffusions, have recently emerged as popular and efficacious policy parameterizations in robotics. There has been much speculation as to the factors underlying their successes, ranging from capturing multimodal action distributions to expressing more complex behaviors. In this work, we perform a comprehensive evaluation of popular generative control policies (GCPs) on common behavior cloning (BC) benchmarks. We find that GCPs do not owe their success to their ability to capture multimodality or to express more complex observation-to-action mappings. Instead, we find that their advantage stems from iterative computation, provided that intermediate steps are supervised during training and this supervision is paired with a suitable level of stochasticity. As a validation of our findings, we show that a minimal iterative policy (MIP), a lightweight two-step regression-based policy, essentially matches the performance of flow GCPs. Our results suggest that the distribution-fitting component of GCPs is less salient than commonly believed and point toward new design spaces focusing solely on control performance. Videos and supplementary materials are available at https://anonymous.4open.science/w/mip-anonymous/.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper investigates why generative control policies (flows and diffusions) succeed in robotic manipulation behavior cloning. It sits within the Generative and Iterative Policy Models leaf, which contains only two papers total. This is a relatively sparse research direction within the broader Policy Architecture and Representation Learning branch, suggesting the specific question of what makes generative policies effective remains underexplored. The paper's core contribution is an empirical decomposition of design factors (multimodality, expressivity, iterative computation) and the proposal of a minimal iterative policy baseline.

The taxonomy reveals neighboring approaches across multiple dimensions. Within Policy Architecture, sibling leaves address Transformer-Based architectures, Object-Centric representations, Latent Representation methods, and Multimodal Action Distribution Modeling—each tackling complementary aspects of policy design. The Temporal Dynamics branch explores memory mechanisms, while Offline Learning examines data efficiency. The paper's focus on iterative computation connects it to temporal reasoning but diverges by isolating iteration as a design primitive rather than modeling long-horizon dependencies. The taxonomy's scope and exclude notes clarify that this work addresses architectural mechanisms, not demonstration collection or task decomposition.

Among 27 candidates examined, none clearly refute the three main contributions. The taxonomy decomposition examined 10 candidates with zero refutations; the Minimal Iterative Policy examined 7 with zero refutations; the empirical finding on multimodality/expressivity examined 10 with zero refutations. This suggests limited prior work directly addressing the same empirical questions within the search scope. However, the small candidate pool and sparse leaf occupancy mean the analysis covers a focused semantic neighborhood rather than exhaustive prior art. The findings appear novel within this limited examination, particularly the claim that iterative computation with supervision—not multimodality—drives generative policy success.

Based on top-27 semantic matches, the work appears to occupy a distinct position questioning assumptions about generative models in manipulation. The sparse leaf and zero refutations across contributions suggest novelty, though the limited search scope leaves open whether broader literature contains overlapping insights. The taxonomy context indicates this is an emerging rather than saturated research direction, with substantial room for further investigation into minimal design principles for control policies.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
27
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: behavior cloning for robotic manipulation. The field encompasses a diverse set of approaches organized around how policies are represented and learned from demonstrations. At the highest level, the taxonomy distinguishes between architectural choices (Policy Architecture and Representation Learning[2]), strategies for breaking down complex tasks (Task Decomposition and Hierarchical Learning[8]), the nature of demonstration data (Demonstration Modality and Data Collection[9]), interactive refinement methods (Interactive and Corrective Learning[13]), domain-specific challenges (Specialized Manipulation Contexts[33]), language grounding (Language-Conditioned Policy Learning[1]), continual adaptation (Continual and Meta-Learning for Manipulation[5]), offline data utilization (Offline Learning and Data Efficiency[6,7]), temporal reasoning (Temporal Dynamics and Memory-Based Learning[29]), robustness concerns (Generalization and Robustness Analysis[24]), multi-task coordination (Multi-Task and Task Planning Integration[44]), and overarching frameworks (Methodological Frameworks and Practical Roadmaps[17]). Within Policy Architecture and Representation Learning, a key branch focuses on Generative and Iterative Policy Models, exploring how expressive probabilistic models can capture multimodal action distributions and enable iterative refinement during execution. Recent work in generative policy modeling has explored trade-offs between expressiveness, sample efficiency, and computational cost. Flow Models Control[0] sits within the Generative and Iterative Policy Models branch, emphasizing continuous normalizing flows to represent complex action distributions for manipulation tasks. This contrasts with nearby approaches like Predictive Inverse Dynamics[40], which leverages inverse models to infer actions from predicted future states, offering a different angle on iterative policy refinement. Both methods address the challenge of capturing multimodal behaviors inherent in demonstration data, yet differ in their underlying generative mechanisms and how they handle temporal dependencies. Open questions persist around scaling these generative architectures to long-horizon tasks, balancing model complexity with real-time control requirements, and ensuring robust generalization across varied manipulation contexts.

Claimed Contributions

Taxonomy of generative control policy design components

The authors propose a systematic framework that decomposes generative control policies into three key components: distributional learning (matching conditional action distributions), stochasticity injection (adding noise during training), and supervised iterative computation (multi-step generation with supervision at each step). This taxonomy enables principled ablation studies to understand which components drive performance.

10 retrieved papers
Minimal Iterative Policy (MIP)

The authors introduce MIP, a lightweight two-step regression-based policy that combines stochasticity injection and supervised iterative computation without distributional learning. MIP achieves performance comparable to flow-based generative control policies across state, pixel, and point-cloud benchmarks while being substantially simpler.

7 retrieved papers
Empirical finding that multi-modality and expressivity do not explain GCP success

Through comprehensive benchmarking and analysis, the authors demonstrate that generative control policies do not outperform regression policies due to capturing multi-modal action distributions or expressing more complex functions. Instead, their advantage comes from combining supervised iterative computation with stochasticity injection during training.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Taxonomy of generative control policy design components

The authors propose a systematic framework that decomposes generative control policies into three key components: distributional learning (matching conditional action distributions), stochasticity injection (adding noise during training), and supervised iterative computation (multi-step generation with supervision at each step). This taxonomy enables principled ablation studies to understand which components drive performance.

Contribution

Minimal Iterative Policy (MIP)

The authors introduce MIP, a lightweight two-step regression-based policy that combines stochasticity injection and supervised iterative computation without distributional learning. MIP achieves performance comparable to flow-based generative control policies across state, pixel, and point-cloud benchmarks while being substantially simpler.

Contribution

Empirical finding that multi-modality and expressivity do not explain GCP success

Through comprehensive benchmarking and analysis, the authors demonstrate that generative control policies do not outperform regression policies due to capturing multi-modal action distributions or expressing more complex functions. Instead, their advantage comes from combining supervised iterative computation with stochasticity injection during training.