DiffusionNFT: Online Diffusion Reinforcement with Forward Process
Overview
Taxonomy
Research Landscape Overview
Claimed Contributions
The authors propose DiffusionNFT, a novel online reinforcement learning approach for diffusion models that operates on the forward diffusion process rather than the reverse process. It contrasts positive and negative generations to define an implicit policy improvement direction, naturally incorporating reinforcement signals into the supervised learning objective without requiring likelihood estimation.
The forward-process formulation enables training with any black-box solvers (not restricted to first-order SDE samplers), requires only clean images rather than full sampling trajectories for optimization, maintains compatibility with standard diffusion training pipelines, and naturally supports off-policy learning without importance sampling.
Instead of learning a separate guidance model and employing guided sampling at inference, the method uses an implicit parameterization that directly integrates reinforcement guidance into a single policy model. This allows continuous RL on one model and eliminates the need for combining multiple models during sampling.
Core Task Comparisons
Comparisons with papers in the same taxonomy category
Contribution Analysis
Detailed comparisons for each claimed contribution
Diffusion Negative-aware FineTuning (DiffusionNFT) paradigm
The authors propose DiffusionNFT, a novel online reinforcement learning approach for diffusion models that operates on the forward diffusion process rather than the reverse process. It contrasts positive and negative generations to define an implicit policy improvement direction, naturally incorporating reinforcement signals into the supervised learning objective without requiring likelihood estimation.
[3] DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models PDF
[6] Understanding reinforcement learning-based fine-tuning of diffusion models: A tutorial and review PDF
[8] Human-Feedback Efficient Reinforcement Learning for Online Diffusion Model Finetuning PDF
[9] Enhancing Sample Efficiency in Online Reinforcement Learning via Policy-Guided Diffusion Models PDF
[35] Offline Reinforcement Learning With Reverse Diffusion Guide Policy PDF
[62] A simple and effective reinforcement learning method for text-to-image diffusion fine-tuning PDF
[63] Controllable guidance in reinforcement learning using diffusion models PDF
[64] DFRL-DS: A Diffusion-based Reinforcement Learning Algorithm in Discrete Actions for Base Station Energy-saving Control PDF
[65] Step-level Reward for Free in RL-based T2I Diffusion Model Fine-tuning PDF
[66] Diffusion models as optimizers for efficient planning in offline rl PDF
Forward-process RL formulation with practical benefits
The forward-process formulation enables training with any black-box solvers (not restricted to first-order SDE samplers), requires only clean images rather than full sampling trajectories for optimization, maintains compatibility with standard diffusion training pipelines, and naturally supports off-policy learning without importance sampling.
[58] Amortizing intractable inference in diffusion models for vision, language, and control PDF
[59] Diffusion Model for Data-Driven Black-Box Optimization PDF
[60] Diffusion-BBO: Diffusion-Based Inverse Modeling for Online Black-Box Optimization PDF
[61] Integrating Diffusion Models into Model-Based Reinforcement Learning for Real-Time Robotic Control A Theoretical Review PDF
Implicit parameterization technique for reinforcement guidance
Instead of learning a separate guidance model and employing guided sampling at inference, the method uses an implicit parameterization that directly integrates reinforcement guidance into a single policy model. This allows continuous RL on one model and eliminates the need for combining multiple models during sampling.