Terminal Velocity Matching

ICLR 2026 Conference SubmissionAnonymous Authors
one-step generative model from scratchdiffusionflow matching
Abstract:

We propose Terminal Velocity Matching (TVM), a generalization of flow matching that enables high-fidelity one- and few-step generative modeling. TVM models the transition between any two diffusion timesteps and regularizes its behavior at its terminal time rather than at the initial time. We prove that TVM provides an upper bound on the 22-Wasserstein distance between data and model distributions when the model is Lipschitz continuous. However, since Diffusion Transformers lack this property, we introduce minimal architectural changes that achieve stable, single-stage training. To make TVM efficient in practice, we develop a fused attention kernel that supports backward passes on Jacobian-Vector Products, which scale well with transformer architectures. On ImageNet-256x256, TVM achieves 3.29 FID with a single function evaluation (NFE) and 1.99 FID with 4 NFEs. It similarly achieves 4.32 1-NFE FID and 2.94 4-NFE FID on ImageNet-512x512, representing state-of-the-art performance for one/few-step models from scratch.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces Terminal Velocity Matching (TVM), a framework that models transitions between arbitrary diffusion timesteps and regularizes terminal-time behavior to enable one-step and few-step generation. It resides in the Mean Flow and Average Velocity Modeling leaf, which contains five papers exploring time-averaged velocity fields and direct noise-to-data mappings. This leaf sits within the Core Flow Matching Frameworks branch, indicating the work addresses foundational training objectives rather than distillation or domain-specific adaptations. The leaf's moderate size suggests an active but not overcrowded research direction focused on trajectory straightening through velocity averaging.

The taxonomy reveals closely related directions in neighboring leaves. Flow Map and Transition Modeling (three papers) learns two-time operators rather than instantaneous velocities, offering a conceptual parallel to TVM's multi-timestep transitions. Trajectory Optimization and Straightening (three papers) pursues straighter paths through geometric objectives, while Velocity Field Learning (four papers) focuses on standard instantaneous flow matching. The Distillation and Acceleration branch (eleven papers across four leaves) addresses step reduction through teacher-student frameworks, contrasting with TVM's single-stage training approach. These boundaries clarify that TVM occupies a niche between pure velocity modeling and explicit distillation methods.

Among twenty-six candidates examined, the TVM framework and Wasserstein bound contributions each show one refutable candidate from ten examined, suggesting some overlap with prior theoretical or methodological work in terminal-time regularization or transport bounds. The fused attention kernel contribution examined six candidates with none refutable, indicating greater technical novelty in the implementation domain. The limited search scope means these statistics reflect top-K semantic matches rather than exhaustive coverage. The framework contribution appears to build incrementally on existing mean-flow ideas, while the kernel optimization addresses a distinct computational bottleneck with less prior work.

Based on the twenty-six candidates examined, TVM demonstrates moderate novelty within its leaf, combining terminal-time regularization with architectural modifications for stable training. The theoretical bound and framework design show measurable overlap with prior transport-based approaches, while the attention kernel represents a more specialized contribution. The analysis covers semantic neighbors and citation-expanded papers but does not claim exhaustive field coverage, leaving open the possibility of additional related work in adjacent research communities or recent preprints.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
26
Contribution Candidate Papers Compared
2
Refutable Paper

Research Landscape Overview

Core task: one-step and few-step generative modeling with flow matching. The field has organized itself around several complementary directions. Core Flow Matching Frameworks and Training Objectives establish foundational methods for learning velocity fields and optimal transport paths, often exploring different coupling strategies and mean-flow formulations to achieve straighter trajectories. Distillation and Acceleration Techniques focus on compressing multi-step flows into efficient one-step or few-step generators, drawing on consistency distillation and progressive refinement ideas. Architectural and Representational Innovations introduce novel network designs, latent-space formulations, and geometric structures that improve expressiveness or computational efficiency. Domain-Specific Applications and Adaptations tailor flow matching to particular modalities such as images, speech, proteins, or video, addressing unique challenges in each setting. Together, these branches reflect a maturing ecosystem where theoretical advances in transport and velocity modeling coexist with practical efforts to scale and specialize generative flows. Within the Core Flow Matching Frameworks branch, a particularly active line of work centers on mean-flow and average-velocity modeling, where methods like Splitmeanflow[2], High-Order Mean Flow[21], and Modular MeanFlow[31] refine how velocity fields are averaged or decomposed to yield more direct paths. Terminal Velocity Matching[0] sits naturally in this cluster, emphasizing terminal-time velocity alignment as a mechanism for straightening flows. Compared to neighbors such as Transport Mean Flows[49], which explores optimal-transport couplings, Terminal Velocity Matching[0] focuses more explicitly on endpoint behavior to reduce curvature. Meanwhile, distillation-oriented works like One Step Shortcut[4] and Distilled Decoding[5] pursue aggressive step reduction through teacher-student frameworks, trading off some trajectory fidelity for inference speed. The interplay between trajectory straightness, step count, and model capacity remains a central open question, with Terminal Velocity Matching[0] contributing a perspective that leverages boundary conditions to guide the learning process.

Claimed Contributions

Terminal Velocity Matching (TVM) framework

TVM is a new training framework that models transitions between any two diffusion timesteps by regularizing terminal velocity rather than initial velocity. Unlike prior flow matching methods, TVM matches the time derivative at the terminal time of trajectories, enabling single-stage training for one-step and few-step generation.

10 retrieved papers
Can Refute
Theoretical upper bound on 2-Wasserstein distance

The authors establish a formal connection between their training objective and distribution matching by proving that TVM upper bounds the 2-Wasserstein distance. This theoretical guarantee distinguishes TVM from prior trajectory matching methods that lack explicit distributional guarantees.

10 retrieved papers
Can Refute
Fused Flash Attention kernel with JVP backward support

The authors introduce an efficient Flash Attention kernel that fuses Jacobian-Vector Product computation with the forward pass and supports backward propagation through JVP results. This implementation achieves up to 65% speedup and significant memory reduction compared to standard PyTorch operations, making TVM practical for large-scale transformer training.

6 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Terminal Velocity Matching (TVM) framework

TVM is a new training framework that models transitions between any two diffusion timesteps by regularizing terminal velocity rather than initial velocity. Unlike prior flow matching methods, TVM matches the time derivative at the terminal time of trajectories, enabling single-stage training for one-step and few-step generation.

Contribution

Theoretical upper bound on 2-Wasserstein distance

The authors establish a formal connection between their training objective and distribution matching by proving that TVM upper bounds the 2-Wasserstein distance. This theoretical guarantee distinguishes TVM from prior trajectory matching methods that lack explicit distributional guarantees.

Contribution

Fused Flash Attention kernel with JVP backward support

The authors introduce an efficient Flash Attention kernel that fuses Jacobian-Vector Product computation with the forward pass and supports backward propagation through JVP results. This implementation achieves up to 65% speedup and significant memory reduction compared to standard PyTorch operations, making TVM practical for large-scale transformer training.