CMT: Mid-Training for Efficient Learning of Consistency, Mean Flow, and Flow-Map Models

ICLR 2026 Conference SubmissionAnonymous Authors
Flow Map ModelsConsistency ModelsMean FlowMid-TrainingDiffusion ModelGenerative Models
Abstract:

Flow map models such as Consistency Models (CM) and Mean Flow (MF) enable few-step generation by learning the long jump of the ODE solution of diffusion models, yet training remains unstable, sensitive to hyperparameters, and costly. Initializing from a pre-trained diffusion model helps, but still requires converting infinitesimal steps into a long-jump map, leaving instability unresolved. We introduce mid-training, the first concept and practical method that inserts a lightweight intermediate stage between the (diffusion) pre-training and the final flow map training (i.e., post-training) for vision generation. Concretely, Consistency Mid-Training (CMT) is a compact and principled stage that trains a model to map points along a solver trajectory from a pre-trained model, starting from a prior sample, directly to the solver-generated clean sample. It yields a trajectory-consistent and stable initialization. This initializer outperforms random and diffusion-based baselines and enables fast, robust convergence without heuristics. Initializing post-training with CMT weights further simplifies flow map learning. Empirically, CMT achieves state of the art two step FIDs: 1.97 on CIFAR-10, 1.32 on ImageNet 64×\times64, and 1.84 on ImageNet 512×\times512, while using up to 98% less training data and GPU time, compared to CMs. On ImageNet 256×\times256, CMT reaches 1-step FID 3.34 while cutting total training time by about 50% compared to MF from scratch (FID 3.43). This establishes CMT as a principled, efficient, and general framework for training flow map models.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
19
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: efficient training of few-step generative flow map models. The field has organized itself around several complementary directions. Flow Map Learning Frameworks establish foundational methods for learning direct mappings between noise and data distributions, exemplified by Flow Matching[3] and Flow Map Matching[20]. Training Strategies and Optimization explores how to make these models converge quickly and reliably, including multi-stage procedures like CMT Mid-Training[0] and Pre-Training GFlowNets[35]. Trajectory Optimization refines the paths taken during generation, while Model Architecture and Design addresses structural choices that enable few-step inference. Instantaneous Flow Matching and Normalizing Flow Models represent distinct theoretical perspectives on continuous-time generative modeling, with works like Normalizing Flows Capable[4] bridging classical normalizing flows and modern flow matching. Domain-Specific Applications and Auxiliary Techniques round out the taxonomy by addressing specialized use cases and supporting methods. A particularly active line of work focuses on distillation and shortcut training to reduce sampling steps, with methods like Latent Consistency Models[7], Improved Shortcut Training[8], and Hyper-SD[16] exploring different trade-offs between sample quality and inference speed. Another dense branch investigates mean flow approaches—Mean Flows[1], Splitmeanflow[6], and Improved Mean Flows[14]—which leverage averaged trajectories to simplify training. CMT Mid-Training[0] sits within the multi-stage training cluster, emphasizing a phased approach that first establishes coarse flow structure before refining few-step generation. This contrasts with single-stage methods like Flow-Anchored Consistency[9] or self-distillation techniques such as Flow Maps Self-Distillation[12], which aim to achieve similar efficiency gains without explicit curriculum design. The central tension across these branches involves balancing training complexity against final model simplicity, with CMT Mid-Training[0] representing a structured compromise that stages the learning process to achieve robust few-step performance.

Claimed Contributions

Consistency Mid-Training (CMT) framework

The authors propose CMT, a novel three-stage training pipeline that adds a compact mid-training phase between diffusion pre-training and flow map post-training. This stage trains a model to map points along solver trajectories directly to clean samples, providing a trajectory-consistent initialization that improves stability and convergence without requiring heuristics like stop-gradients or custom time weighting.

10 retrieved papers
Unified formulation of flow map objectives

The authors introduce a unified view connecting existing flow map formulations (Consistency Models and Mean Flow) through a reverse-time generative perspective. This reinterpretation clarifies the oracle objectives and motivates the design of CMT's training losses for both special (Ψt→0) and general (Ψt→s) flow maps.

9 retrieved papers
Theoretical analysis of gradient bias reduction

The authors provide theoretical analysis demonstrating that CMT initialization yields gradient bias of O(ε + Δt²), significantly lower than diffusion-based initialization which incurs additional bias terms from forward noising and posterior mean mismatch. This formalizes why CMT provides a more robust starting point for flow map training.

0 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Consistency Mid-Training (CMT) framework

The authors propose CMT, a novel three-stage training pipeline that adds a compact mid-training phase between diffusion pre-training and flow map post-training. This stage trains a model to map points along solver trajectories directly to clean samples, providing a trajectory-consistent initialization that improves stability and convergence without requiring heuristics like stop-gradients or custom time weighting.

Contribution

Unified formulation of flow map objectives

The authors introduce a unified view connecting existing flow map formulations (Consistency Models and Mean Flow) through a reverse-time generative perspective. This reinterpretation clarifies the oracle objectives and motivates the design of CMT's training losses for both special (Ψt→0) and general (Ψt→s) flow maps.

Contribution

Theoretical analysis of gradient bias reduction

The authors provide theoretical analysis demonstrating that CMT initialization yields gradient bias of O(ε + Δt²), significantly lower than diffusion-based initialization which incurs additional bias terms from forward noising and posterior mean mismatch. This formalizes why CMT provides a more robust starting point for flow map training.