Flow Matching with Injected Noise for Offline-to-Online Reinforcement Learning
Overview
Overall Novelty Assessment
The paper proposes FINO, a method combining noise injection and entropy-guided sampling to improve offline-to-online reinforcement learning with flow matching policies. According to the taxonomy, this work resides in the 'Noise Injection and Entropy-Guided Exploration' leaf under 'Offline-to-Online Transition and Exploration'. Notably, this leaf contains only the original paper itself—no sibling papers are listed—suggesting this specific combination of noise injection and entropy guidance represents a relatively sparse research direction within the broader offline-to-online flow matching landscape.
The taxonomy reveals that FINO's parent branch ('Offline-to-Online Transition and Exploration') contains three other leaves: adaptive post-training for vision-language-action models, action chunking for sample-efficient fine-tuning, and unified online-offline learning via implicit regularization. These neighboring directions address similar offline-to-online challenges but through different mechanisms—VLA-specific objectives, temporally extended actions, or implicit value regularization. The broader taxonomy shows related work in energy-guided training and critic design, but these focus on training-time objectives or value learning rather than exploration strategies during online fine-tuning, clarifying FINO's distinct positioning.
Among 29 candidates examined, the noise-injected training scheme (Contribution 2) shows one refutable candidate from 10 examined, indicating some prior work on noise-based exploration exists within the limited search scope. The overall FINO framework (Contribution 1) and entropy-guided sampling (Contribution 3) show no refutable candidates among 9 and 10 examined respectively, suggesting these specific combinations appear less directly addressed in the top-30 semantic matches. The statistics indicate moderate prior work overlap for the noise injection component, while the integrated framework and entropy mechanism appear more distinctive within this limited candidate pool.
Based on the top-29 semantic matches examined, FINO appears to occupy a relatively unexplored niche combining noise injection with entropy-guided sampling for flow-based offline-to-online RL. The taxonomy structure confirms this is a sparse leaf with no listed siblings, though the limited search scope means potentially relevant work outside the top-K matches remains unexamined. The analysis covers semantic proximity and citation-based expansion but does not constitute an exhaustive field survey.
Taxonomy
Research Landscape Overview
Claimed Contributions
The authors introduce FINO, a method that injects noise into flow matching during offline pre-training to expand the action space beyond the dataset, enabling more effective exploration during subsequent online fine-tuning in reinforcement learning.
The authors propose a training objective that injects controlled noise into the flow matching formulation, encouraging the policy to explore a broader range of actions beyond those present in the offline dataset while maintaining valid continuous normalizing flows.
The authors introduce a sampling mechanism that constructs a distribution over candidate actions based on their action-values and dynamically adjusts a temperature parameter using policy entropy, enabling adaptive balancing between exploration and exploitation during online fine-tuning.
Core Task Comparisons
Comparisons with papers in the same taxonomy category
Contribution Analysis
Detailed comparisons for each claimed contribution
Flow Matching with Injected Noise for Offline-to-Online RL (FINO)
The authors introduce FINO, a method that injects noise into flow matching during offline pre-training to expand the action space beyond the dataset, enabling more effective exploration during subsequent online fine-tuning in reinforcement learning.
[1] EXPO: Stable Reinforcement Learning with Expressive Policies PDF
[2] FlowQ: Energy-Guided Flow Policies for Offline Reinforcement Learning PDF
[3] SAC Flow: Sample-Efficient Reinforcement Learning of Flow-Based Policies via Velocity-Reparameterized Sequential Modeling PDF
[4] Flow q-learning PDF
[8] Balancing Signal and Variance: Adaptive Offline RL Post-Training for VLA Flow Models PDF
[13] Extremum Flow Matching for Offline Goal Conditioned Reinforcement Learning PDF
[14] Energy-Weighted Flow Matching for Offline Reinforcement Learning PDF
[15] Composite Flow Matching for Reinforcement Learning with Shifted-Dynamics Data PDF
[23] Bayesian Design Principles for Offline-to-Online Reinforcement Learning PDF
Noise-injected training scheme for flow matching
The authors propose a training objective that injects controlled noise into the flow matching formulation, encouraging the policy to explore a broader range of actions beyond those present in the offline dataset while maintaining valid continuous normalizing flows.
[25] ReinFlow: Fine-tuning flow matching policy with online reinforcement learning PDF
[24] Flow-GRPO: Training Flow Matching Models via Online RL PDF
[26] Stochastic Flow Matching for Resolving Small-Scale Physics PDF
[27] On Robustness of Vision-Language-Action Model against Multi-Modal Perturbations PDF
[28] Tempflow-grpo: When timing matters for grpo in flow models PDF
[29] Coefficients-Preserving Sampling for Reinforcement Learning with Flow Matching PDF
[30] Dynamic-TreeRPO: Breaking the Independent Trajectory Bottleneck with Structured Sampling PDF
[31] Actionflow: Efficient, accurate, and fast policies with spatially symmetric flow matching PDF
[32] ARFlow: Human Action-Reaction Flow Matching with Physical Guidance PDF
[33] OSCAR: Orthogonal Stochastic Control for Alignment-Respecting Diversity in Flow Matching PDF
Entropy-guided sampling mechanism
The authors introduce a sampling mechanism that constructs a distribution over candidate actions based on their action-values and dynamically adjusts a temperature parameter using policy entropy, enabling adaptive balancing between exploration and exploitation during online fine-tuning.