CMT: Mid-Training for Efficient Learning of Consistency, Mean Flow, and Flow-Map Models

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 7.0 Download Report PDF

Flow Map ModelsConsistency ModelsMean FlowMid-TrainingDiffusion ModelGenerative Models

Flow map models such as Consistency Models (CM) and Mean Flow (MF) enable few-step generation by learning the long jump of the ODE solution of diffusion models, yet training remains unstable, sensitive to hyperparameters, and costly. Initializing from a pre-trained diffusion model helps, but still requires converting infinitesimal steps into a long-jump map, leaving instability unresolved. We introduce mid-training, the first concept and practical method that inserts a lightweight intermediate stage between the (diffusion) pre-training and the final flow map training (i.e., post-training) for vision generation. Concretely, Consistency Mid-Training (CMT) is a compact and principled stage that trains a model to map points along a solver trajectory from a pre-trained model, starting from a prior sample, directly to the solver-generated clean sample. It yields a trajectory-consistent and stable initialization. This initializer outperforms random and diffusion-based baselines and enables fast, robust convergence without heuristics. Initializing post-training with CMT weights further simplifies flow map learning. Empirically, CMT achieves state of the art two step FIDs: 1.97 on CIFAR-10, 1.32 on ImageNet 64 $\times$ 64, and 1.84 on ImageNet 512 $\times$ 512, while using up to 98% less training data and GPU time, compared to CMs. On ImageNet 256 $\times$ 256, CMT reaches 1-step FID 3.34 while cutting total training time by about 50% compared to MF from scratch (FID 3.43). This establishes CMT as a principled, efficient, and general framework for training flow map models.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: efficient training of few-step generative flow map models. The field has organized itself around several complementary directions. Flow Map Learning Frameworks establish foundational methods for learning direct mappings between noise and data distributions, exemplified by Flow Matching[3] and Flow Map Matching[20]. Training Strategies and Optimization explores how to make these models converge quickly and reliably, including multi-stage procedures like CMT Mid-Training[0] and Pre-Training GFlowNets[35]. Trajectory Optimization refines the paths taken during generation, while Model Architecture and Design addresses structural choices that enable few-step inference. Instantaneous Flow Matching and Normalizing Flow Models represent distinct theoretical perspectives on continuous-time generative modeling, with works like Normalizing Flows Capable[4] bridging classical normalizing flows and modern flow matching. Domain-Specific Applications and Auxiliary Techniques round out the taxonomy by addressing specialized use cases and supporting methods. A particularly active line of work focuses on distillation and shortcut training to reduce sampling steps, with methods like Latent Consistency Models[7], Improved Shortcut Training[8], and Hyper-SD[16] exploring different trade-offs between sample quality and inference speed. Another dense branch investigates mean flow approaches—Mean Flows[1], Splitmeanflow[6], and Improved Mean Flows[14]—which leverage averaged trajectories to simplify training. CMT Mid-Training[0] sits within the multi-stage training cluster, emphasizing a phased approach that first establishes coarse flow structure before refining few-step generation. This contrasts with single-stage methods like Flow-Anchored Consistency[9] or self-distillation techniques such as Flow Maps Self-Distillation[12], which aim to achieve similar efficiency gains without explicit curriculum design. The central tension across these branches involves balancing training complexity against final model simplicity, with CMT Mid-Training[0] representing a structured compromise that stages the learning process to achieve robust few-step performance.

Claimed Contributions

Consistency Mid-Training (CMT) framework

10 retrieved papers

The authors propose CMT, a novel three-stage training pipeline that adds a compact mid-training phase between diffusion pre-training and flow map post-training. This stage trains a model to map points along solver trajectories directly to clean samples, providing a trajectory-consistent initialization that improves stability and convergence without requiring heuristics like stop-gradients or custom time weighting.

10 retrieved papers

Unified formulation of flow map objectives

9 retrieved papers

The authors introduce a unified view connecting existing flow map formulations (Consistency Models and Mean Flow) through a reverse-time generative perspective. This reinterpretation clarifies the oracle objectives and motivates the design of CMT's training losses for both special (Ψt→0) and general (Ψt→s) flow maps.

9 retrieved papers

Theoretical analysis of gradient bias reduction

0 retrieved papers

The authors provide theoretical analysis demonstrating that CMT initialization yields gradient bias of O(ε + Δt²), significantly lower than diffusion-based initialization which incurs additional bias terms from forward noising and posterior mean mismatch. This formalizes why CMT provides a more robust starting point for flow map training.

0 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[35] Pre-Training and Fine-Tuning Generative Flow Networks PDF

Pan Ling, Ling Pan, Jain, Moksh, Moksh Jain, Madan, Kanika, Kanika Madan, Bengio, Yoshua, Yoshua Bengio, Y. Bengio (2023)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Consistency Mid-Training (CMT) framework

[51] Adversarial diffusion distillation PDF

Cannot Refute

[52] Score identity distillation: Exponentially fast distillation of pretrained diffusion models for one-step generation PDF

Cannot Refute

[53] Di o: Distilling masked diffusion models into one-step generator PDF

Cannot Refute

[54] Knowledge diffusion for distillation PDF

Cannot Refute

[55] Distilling diffusion models into conditional gans PDF

Cannot Refute

[56] Composition and control with distilled energy diffusion models and sequential monte carlo PDF

Cannot Refute

[57] Continual learning of diffusion models with generative distillation PDF

Cannot Refute

[58] Diff-instruct: A universal approach for transferring knowledge from pre-trained diffusion models PDF

Cannot Refute

[59] One-step diffusion distillation via deep equilibrium models PDF

Cannot Refute

[60] Dreamteacher: Pretraining image backbones with deep generative models PDF

Cannot Refute

Contribution

Unified formulation of flow map objectives

[9] Flow-anchored consistency models PDF

Cannot Refute

[18] Modular MeanFlow: Towards Stable and Scalable One-Step Generative Modeling PDF

Cannot Refute

[20] Flow map matching PDF

Cannot Refute

[61] Flowpolicy: Enabling fast and robust 3d flow-based policy via consistency flow matching for robot manipulation PDF

Cannot Refute

[62] High-order flow matching: Unified framework and sharp statistical rates PDF

Cannot Refute

[64] Efficient Image Restoration via Latent Consistency Flow Matching PDF

Cannot Refute

[65] Towards a Unified Framework for Consistency Generative Modeling PDF

Cannot Refute

[66] Inverse Flow and Consistency Models PDF

Cannot Refute

[67] UniConFlow: A Unified Constrained Generalization Framework for Certified Motion Planning with Flow Matching Models PDF

Cannot Refute

Contribution

CMT: Mid-Training for Efficient Learning of Consistency, Mean Flow, and Flow-Map Models

Overview

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[35] Pre-Training and Fine-Tuning Generative Flow Networks PDF

Contribution Analysis

Consistency Mid-Training (CMT) framework

[51] Adversarial diffusion distillation PDF

[52] Score identity distillation: Exponentially fast distillation of pretrained diffusion models for one-step generation PDF

[53] Di o: Distilling masked diffusion models into one-step generator PDF

[54] Knowledge diffusion for distillation PDF

[55] Distilling diffusion models into conditional gans PDF

[56] Composition and control with distilled energy diffusion models and sequential monte carlo PDF

[57] Continual learning of diffusion models with generative distillation PDF

[58] Diff-instruct: A universal approach for transferring knowledge from pre-trained diffusion models PDF

[59] One-step diffusion distillation via deep equilibrium models PDF

[60] Dreamteacher: Pretraining image backbones with deep generative models PDF

Unified formulation of flow map objectives

[9] Flow-anchored consistency models PDF

[18] Modular MeanFlow: Towards Stable and Scalable One-Step Generative Modeling PDF

[20] Flow map matching PDF

[61] Flowpolicy: Enabling fast and robust 3d flow-based policy via consistency flow matching for robot manipulation PDF

[62] High-order flow matching: Unified framework and sharp statistical rates PDF

[64] Efficient Image Restoration via Latent Consistency Flow Matching PDF

[65] Towards a Unified Framework for Consistency Generative Modeling PDF

[66] Inverse Flow and Consistency Models PDF

[67] UniConFlow: A Unified Constrained Generalization Framework for Certified Motion Planning with Flow Matching Models PDF

Theoretical analysis of gradient bias reduction

Table of Contents