GLASS Flows: Efficient Inference for Reward Alignment of Flow and Diffusion Models

ICLR 2026 Conference SubmissionAnonymous Authors
Flow Matching; Diffusion Models; Reward Alignment; Reward Adaptation; Inference-time scaling; Feynman-Kac Steering; Markov transitions; Sampling methods
Abstract:

The performance of flow matching and diffusion models can be greatly improved at inference time using reward adaptation algorithms, yet efficiency remains a major limitation. While several algorithms were proposed, we demonstrate that a common bottleneck is the sampling method these algorithms rely on: many algorithms require to sample Markov transitions via SDE sampling, which is significantly less efficient and often less performant than ODE sampling. To remove this bottleneck, we introduce GLASS Flows, a new sampling paradigm that simulates a ''flow matching model within a flow matching model'' to sample Markov transitions. As we show in this work, this ''inner'' flow matching model can be retrieved from any pre-trained model without any re-training, effectively combining the efficiency of ODEs with the stochastic evolution of SDEs. On large-scale text-to-image models, we show that GLASS Flows eliminate the trade-off between stochastic evolution and efficiency. GLASS Flows improve state-of-the-art performance in text-to-image generation, making it a simple, drop-in solution for inference-time scaling of flow and diffusion models.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces GLASS Flows, a sampling paradigm that simulates a 'flow matching model within a flow matching model' to enable efficient Markov transitions for reward adaptation algorithms. It resides in the Hybrid and Unified Alignment Frameworks leaf, which contains only three papers total, indicating a relatively sparse research direction. This leaf focuses on methods that combine training-time and inference-time strategies, distinguishing it from purely inference-based guidance or purely training-based fine-tuning approaches that dominate other branches of the taxonomy.

The taxonomy reveals substantial activity in neighboring areas: Inference-Time Alignment Methods includes six gradient-based guidance papers and six sampling-based alignment papers, while Training-Based Alignment Methods spans multiple subtopics with over twenty papers across RL fine-tuning and preference optimization. GLASS Flows bridges these domains by addressing a bottleneck in inference-time reward adaptation—specifically, the inefficiency of SDE sampling—while maintaining compatibility with pre-trained models. The scope notes clarify that hybrid frameworks must integrate both paradigms, whereas purely inference-based methods (e.g., gradient guidance) or purely training-based methods (e.g., policy gradient fine-tuning) belong elsewhere.

Among the three contributions analyzed, none were clearly refuted by the twenty-nine candidates examined. The first contribution (GLASS Flows sampling paradigm) examined ten candidates with zero refutable overlaps; the second (efficient ODE-based transition sampling) examined nine candidates with zero refutations; the third (application to inference-time reward alignment) examined ten candidates with zero refutations. This suggests that within the limited search scope—primarily top-K semantic matches and citation expansion—no prior work directly anticipates the specific combination of flow-within-flow sampling and ODE-based Markov transitions.

Based on the limited literature search of twenty-nine candidates, the work appears to occupy a distinct position within the sparse hybrid alignment space. The analysis does not cover exhaustive exploration of all inference-time or training-based methods, nor does it examine unpublished or domain-specific variants. The contribution-level statistics indicate no immediate prior work overlap among examined candidates, though the small size of the hybrid frameworks leaf and the modest search scope leave open the possibility of related techniques in adjacent branches or未被检索的文献.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
29
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: Reward alignment of flow and diffusion models. The field has organized itself around several complementary strategies for steering generative models toward desired behaviors. Inference-Time Alignment Methods adjust sampling procedures without retraining, offering flexibility at the cost of computational overhead during generation. Training-Based Alignment Methods modify model parameters through reinforcement learning or preference optimization, embedding reward signals directly into the generative process. Hybrid and Unified Alignment Frameworks combine both paradigms, seeking to balance training efficiency with inference-time adaptability. Additional branches address Reward Hacking and Over-Optimization Mitigation to prevent degenerate solutions, Specialized Alignment Applications targeting domains like video or molecular design, and Accelerated and Efficient Alignment techniques that reduce computational burdens. Survey and Review Papers synthesize emerging best practices, while Auxiliary and Related Methods explore connections to broader generative modeling principles. Recent work reveals tension between sample quality and computational cost, with some studies favoring lightweight inference-time guidance (Inference-Time RL Guidance[2], Training-Free Alignment[49]) and others advocating for parameter updates that internalize reward structure (Flow GRPO[6], ImageReFL[7]). A particularly active line explores how to handle sparse or multi-objective rewards without collapsing diversity (Sparse Reward Alignment[3], Reward-Diversity Tradeoffs[17]). GLASS Flows[0] sits within the Hybrid and Unified Alignment Frameworks branch, closely related to GLASS Transition Sampling[14] and ReALM-GEN[27], emphasizing a principled integration of flow-matching dynamics with reward-driven adjustments. Compared to purely inference-based approaches like Inference-Time RL Guidance[2], GLASS Flows[0] offers tighter coupling between training and sampling, while differing from purely training-centric methods by retaining inference-time flexibility. This positioning reflects ongoing efforts to unify the strengths of both paradigms without incurring prohibitive over-optimization or computational expense.

Claimed Contributions

GLASS Flows sampling paradigm for Markov transitions

The authors propose GLASS Flows, a method that constructs an inner flow matching model to sample Markov transitions from pre-trained flow and diffusion models without retraining. This approach combines the efficiency of ODEs with the stochastic evolution characteristic of SDEs by using sufficient statistics to transform pre-trained models.

10 retrieved papers
Efficient transition sampling via ODEs without SDE bottleneck

The method eliminates the common bottleneck in reward alignment algorithms by enabling efficient sampling of Markov transitions using ODEs rather than slower SDE sampling. This is achieved by retrieving an inner flow matching model from pre-trained models without additional training.

9 retrieved papers
Application to inference-time reward alignment with state-of-the-art performance

The authors demonstrate that GLASS Flows, when combined with Feynman-Kac Steering, achieve state-of-the-art performance improvements in text-to-image generation. The method serves as a plug-in solution for inference-time reward alignment algorithms that previously relied on inefficient SDE sampling.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

GLASS Flows sampling paradigm for Markov transitions

The authors propose GLASS Flows, a method that constructs an inner flow matching model to sample Markov transitions from pre-trained flow and diffusion models without retraining. This approach combines the efficiency of ODEs with the stochastic evolution characteristic of SDEs by using sufficient statistics to transform pre-trained models.

Contribution

Efficient transition sampling via ODEs without SDE bottleneck

The method eliminates the common bottleneck in reward alignment algorithms by enabling efficient sampling of Markov transitions using ODEs rather than slower SDE sampling. This is achieved by retrieving an inner flow matching model from pre-trained models without additional training.

Contribution

Application to inference-time reward alignment with state-of-the-art performance

The authors demonstrate that GLASS Flows, when combined with Feynman-Kac Steering, achieve state-of-the-art performance improvements in text-to-image generation. The method serves as a plug-in solution for inference-time reward alignment algorithms that previously relied on inefficient SDE sampling.