GLASS Flows: Efficient Inference for Reward Alignment of Flow and Diffusion Models

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 7.0 Download Report PDF

Flow Matching; Diffusion Models; Reward Alignment; Reward Adaptation; Inference-time scaling; Feynman-Kac Steering; Markov transitions; Sampling methods

The performance of flow matching and diffusion models can be greatly improved at inference time using reward adaptation algorithms, yet efficiency remains a major limitation. While several algorithms were proposed, we demonstrate that a common bottleneck is the sampling method these algorithms rely on: many algorithms require to sample Markov transitions via SDE sampling, which is significantly less efficient and often less performant than ODE sampling. To remove this bottleneck, we introduce GLASS Flows, a new sampling paradigm that simulates a ''flow matching model within a flow matching model'' to sample Markov transitions. As we show in this work, this ''inner'' flow matching model can be retrieved from any pre-trained model without any re-training, effectively combining the efficiency of ODEs with the stochastic evolution of SDEs. On large-scale text-to-image models, we show that GLASS Flows eliminate the trade-off between stochastic evolution and efficiency. GLASS Flows improve state-of-the-art performance in text-to-image generation, making it a simple, drop-in solution for inference-time scaling of flow and diffusion models.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces GLASS Flows, a sampling paradigm that simulates a 'flow matching model within a flow matching model' to enable efficient Markov transitions for reward adaptation algorithms. It resides in the Hybrid and Unified Alignment Frameworks leaf, which contains only three papers total, indicating a relatively sparse research direction. This leaf focuses on methods that combine training-time and inference-time strategies, distinguishing it from purely inference-based guidance or purely training-based fine-tuning approaches that dominate other branches of the taxonomy.

The taxonomy reveals substantial activity in neighboring areas: Inference-Time Alignment Methods includes six gradient-based guidance papers and six sampling-based alignment papers, while Training-Based Alignment Methods spans multiple subtopics with over twenty papers across RL fine-tuning and preference optimization. GLASS Flows bridges these domains by addressing a bottleneck in inference-time reward adaptation—specifically, the inefficiency of SDE sampling—while maintaining compatibility with pre-trained models. The scope notes clarify that hybrid frameworks must integrate both paradigms, whereas purely inference-based methods (e.g., gradient guidance) or purely training-based methods (e.g., policy gradient fine-tuning) belong elsewhere.

Among the three contributions analyzed, none were clearly refuted by the twenty-nine candidates examined. The first contribution (GLASS Flows sampling paradigm) examined ten candidates with zero refutable overlaps; the second (efficient ODE-based transition sampling) examined nine candidates with zero refutations; the third (application to inference-time reward alignment) examined ten candidates with zero refutations. This suggests that within the limited search scope—primarily top-K semantic matches and citation expansion—no prior work directly anticipates the specific combination of flow-within-flow sampling and ODE-based Markov transitions.

Based on the limited literature search of twenty-nine candidates, the work appears to occupy a distinct position within the sparse hybrid alignment space. The analysis does not cover exhaustive exploration of all inference-time or training-based methods, nor does it examine unpublished or domain-specific variants. The contribution-level statistics indicate no immediate prior work overlap among examined candidates, though the small size of the hybrid frameworks leaf and the modest search scope leave open the possibility of related techniques in adjacent branches or未被检索的文献.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Reward alignment of flow and diffusion models. The field has organized itself around several complementary strategies for steering generative models toward desired behaviors. Inference-Time Alignment Methods adjust sampling procedures without retraining, offering flexibility at the cost of computational overhead during generation. Training-Based Alignment Methods modify model parameters through reinforcement learning or preference optimization, embedding reward signals directly into the generative process. Hybrid and Unified Alignment Frameworks combine both paradigms, seeking to balance training efficiency with inference-time adaptability. Additional branches address Reward Hacking and Over-Optimization Mitigation to prevent degenerate solutions, Specialized Alignment Applications targeting domains like video or molecular design, and Accelerated and Efficient Alignment techniques that reduce computational burdens. Survey and Review Papers synthesize emerging best practices, while Auxiliary and Related Methods explore connections to broader generative modeling principles. Recent work reveals tension between sample quality and computational cost, with some studies favoring lightweight inference-time guidance (Inference-Time RL Guidance[2], Training-Free Alignment[49]) and others advocating for parameter updates that internalize reward structure (Flow GRPO[6], ImageReFL[7]). A particularly active line explores how to handle sparse or multi-objective rewards without collapsing diversity (Sparse Reward Alignment[3], Reward-Diversity Tradeoffs[17]). GLASS Flows[0] sits within the Hybrid and Unified Alignment Frameworks branch, closely related to GLASS Transition Sampling[14] and ReALM-GEN[27], emphasizing a principled integration of flow-matching dynamics with reward-driven adjustments. Compared to purely inference-based approaches like Inference-Time RL Guidance[2], GLASS Flows[0] offers tighter coupling between training and sampling, while differing from purely training-centric methods by retaining inference-time flexibility. This positioning reflects ongoing efforts to unify the strengths of both paradigms without incurring prohibitive over-optimization or computational expense.

Claimed Contributions

GLASS Flows sampling paradigm for Markov transitions

10 retrieved papers

The authors propose GLASS Flows, a method that constructs an inner flow matching model to sample Markov transitions from pre-trained flow and diffusion models without retraining. This approach combines the efficiency of ODEs with the stochastic evolution characteristic of SDEs by using sufficient statistics to transform pre-trained models.

10 retrieved papers

Efficient transition sampling via ODEs without SDE bottleneck

9 retrieved papers

The method eliminates the common bottleneck in reward alignment algorithms by enabling efficient sampling of Markov transitions using ODEs rather than slower SDE sampling. This is achieved by retrieving an inner flow matching model from pre-trained models without additional training.

9 retrieved papers

Application to inference-time reward alignment with state-of-the-art performance

10 retrieved papers

The authors demonstrate that GLASS Flows, when combined with Feynman-Kac Steering, achieve state-of-the-art performance improvements in text-to-image generation. The method serves as a plug-in solution for inference-time reward alignment algorithms that previously relied on inefficient SDE sampling.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[14] GLASS Flows: Transition Sampling for Alignment of Flow and Diffusion Models PDF

Holderrieth, Peter, Singer, Uriel, Peter Holderrieth, Jaakkola, Tommi, Uriel Singer, Chen, Ricky T. Q., T. Jaakkola, Lipman, Yaron, Ricky T. Q. Chen, Karrer, Brian, Y. Lipman, Brian Karrer (2025)

[27] ReALM-GEN: Real-World Constrained and Preference-Aligned Flow-and Diffusion-based Generative Models PDF

P Giampouras, M Mardani, Y Li, G Daras (2026)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

GLASS Flows sampling paradigm for Markov transitions

[51] Flow network based generative models for non-iterative diverse candidate generation PDF

Cannot Refute

[52] Flow marching for a generative PDE foundation model PDF

Cannot Refute

[53] Generative flow networks for discrete probabilistic modeling PDF

Cannot Refute

[54] Generator matching: Generative modeling with arbitrary markov processes PDF

Cannot Refute

[55] Stability of Schr" odinger bridges and Sinkhorn semigroups for log-concave models PDF

Cannot Refute

[56] Flow Matching: Markov kernels, stochastic processes and transport plans PDF

Cannot Refute

[57] Jarzynski Reweighting and Sampling Dynamics for Training Energy-Based Models: Theoretical Analysis of Different Transition Kernels PDF

Cannot Refute

[58] Bayesian structure learning with generative flow networks PDF

Cannot Refute

[59] Flow matching with general discrete paths: A kinetic-optimal perspective PDF

Cannot Refute

[60] Videoflow: A flow-based generative model for video PDF

Cannot Refute

Contribution

Efficient transition sampling via ODEs without SDE bottleneck

[68] New algorithms for sampling and diffusion models PDF

Cannot Refute

[69] Fast sampling of diffusion models with exponential integrator PDF

Cannot Refute

[71] Adjointdeis: Efficient gradients for diffusion models PDF

Cannot Refute

[72] Sa-solver: Stochastic adams solver for fast sampling of diffusion models PDF

Cannot Refute

[73] An Ordinary Differential Equation Sampler with Stochastic Start for Diffusion Bridge Models PDF

Cannot Refute

[74] On the mathematics of diffusion models PDF

Cannot Refute

[75] Stochastic Transport Maps in Diffusion Models and Sampling PDF

Cannot Refute

[76] The Effect of Stochasticity in Score-Based Diffusion Sampling: a KL Divergence Analysis PDF

Cannot Refute

[77] A training-free conditional diffusion model for learning stochastic dynamical systems PDF

Cannot Refute

Contribution

Application to inference-time reward alignment with state-of-the-art performance

[18] Reward-Instruct: A Reward-Centric Approach to Fast Photo-Realistic Image Generation PDF

Cannot Refute

[35] MIRA: Towards Mitigating Reward Hacking in Inference-Time Alignment of T2I Diffusion Models PDF

Cannot Refute

[44] Test-time Alignment of Diffusion Models without Reward Over-optimization PDF

Cannot Refute

[61] Dreamsync: Aligning text-to-image generation with image understanding feedback PDF

Cannot Refute

[62] RL for consistency models: Reward guided text-to-image generation with fast inference PDF

Cannot Refute

[63] Connections between reinforcement learning with feedback, test-time scaling, and diffusion guidance: An anthology PDF

Cannot Refute

[64] End-to-end Learning of Sparse Interventions on Activations to Steer Generation PDF

Cannot Refute

[65] Parrot: Pareto-optimal multi-reward reinforcement learning framework for text-to-image generation PDF

Cannot Refute

[66] Controlling Latent Diffusion Using Latent CLIP PDF

Cannot Refute

[67] Reno: Enhancing one-step text-to-image models through reward-based noise optimization PDF

Cannot Refute

GLASS Flows: Efficient Inference for Reward Alignment of Flow and Diffusion Models

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[14] GLASS Flows: Transition Sampling for Alignment of Flow and Diffusion Models PDF

[27] ReALM-GEN: Real-World Constrained and Preference-Aligned Flow-and Diffusion-based Generative Models PDF

Contribution Analysis

GLASS Flows sampling paradigm for Markov transitions

[51] Flow network based generative models for non-iterative diverse candidate generation PDF

[52] Flow marching for a generative PDE foundation model PDF

[53] Generative flow networks for discrete probabilistic modeling PDF

[54] Generator matching: Generative modeling with arbitrary markov processes PDF

[55] Stability of Schr" odinger bridges and Sinkhorn semigroups for log-concave models PDF

[56] Flow Matching: Markov kernels, stochastic processes and transport plans PDF

[57] Jarzynski Reweighting and Sampling Dynamics for Training Energy-Based Models: Theoretical Analysis of Different Transition Kernels PDF

[58] Bayesian structure learning with generative flow networks PDF

[59] Flow matching with general discrete paths: A kinetic-optimal perspective PDF

[60] Videoflow: A flow-based generative model for video PDF

Efficient transition sampling via ODEs without SDE bottleneck

[68] New algorithms for sampling and diffusion models PDF

[69] Fast sampling of diffusion models with exponential integrator PDF

[71] Adjointdeis: Efficient gradients for diffusion models PDF

[72] Sa-solver: Stochastic adams solver for fast sampling of diffusion models PDF

[73] An Ordinary Differential Equation Sampler with Stochastic Start for Diffusion Bridge Models PDF

[74] On the mathematics of diffusion models PDF

[75] Stochastic Transport Maps in Diffusion Models and Sampling PDF

[76] The Effect of Stochasticity in Score-Based Diffusion Sampling: a KL Divergence Analysis PDF

[77] A training-free conditional diffusion model for learning stochastic dynamical systems PDF

Application to inference-time reward alignment with state-of-the-art performance

[18] Reward-Instruct: A Reward-Centric Approach to Fast Photo-Realistic Image Generation PDF

[35] MIRA: Towards Mitigating Reward Hacking in Inference-Time Alignment of T2I Diffusion Models PDF

[44] Test-time Alignment of Diffusion Models without Reward Over-optimization PDF

[61] Dreamsync: Aligning text-to-image generation with image understanding feedback PDF

[62] RL for consistency models: Reward guided text-to-image generation with fast inference PDF

[63] Connections between reinforcement learning with feedback, test-time scaling, and diffusion guidance: An anthology PDF

[64] End-to-end Learning of Sparse Interventions on Activations to Steer Generation PDF

[65] Parrot: Pareto-optimal multi-reward reinforcement learning framework for text-to-image generation PDF

[66] Controlling Latent Diffusion Using Latent CLIP PDF

[67] Reno: Enhancing one-step text-to-image models through reward-based noise optimization PDF

Table of Contents