FS-DFM: Fast and Accurate Long Text Generation with Few-Step Diffusion Language Models

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

Diffusion language modelfew step generationflow matching

Autoregressive language models (ARMs) deliver strong likelihoods, but are inherently serial: they generate one token per forward pass, which limits throughput and inflates latency for long sequences. Diffusion Language Models (DLMs) parallelize across positions and thus appear promising for language generation, yet standard discrete diffusion typically needs hundreds to thousands of model evaluations to reach high quality, trading serial depth for iterative breadth. We introduce FS-DFM, Few-Step Discrete Flow-Matching. A discrete flow-matching model designed for speed without sacrificing quality. The core idea is simple: make the number of sampling steps an explicit parameter and train the model to be consistent across step budgets, so one big move lands where many small moves would. We pair this with a reliable update rule that moves probability in the right direction without overshooting, and with strong teacher guidance distilled from long-run trajectories. Together, these choices make few-step sampling stable, accurate, and easy to control. On language modeling benchmarks, FS-DFM with 8 sampling steps achieves perplexity parity with a 1,024-step discrete-flow baseline for generating 1,024 tokens using a similar-size model, delivering up to 128× faster sampling and corresponding latency/throughput gains.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces FS-DFM, a discrete flow-matching framework optimized for generating long text sequences in very few sampling steps (e.g., 8 steps achieving parity with 1,024-step baselines). It resides in the 'Few-Step Accelerated Text Generation' leaf, which contains only two papers total, including this one. This indicates a sparse research direction within the broader discrete flow-matching landscape. The taxonomy shows six papers across all branches, suggesting the field itself is relatively nascent, with few-step acceleration representing a focused but under-explored niche.

The taxonomy tree reveals that discrete flow-matching for text generation branches into few-step acceleration and variable-length sequence handling. Neighboring categories include discrete variational methods (e.g., discourse-aware latent variable models) and extensions to non-text modalities like protein design and streaming audio. The scope notes clarify that FS-DFM's emphasis on step-budget optimization distinguishes it from standard multi-step approaches and from latent-guided methods that rely on variational frameworks rather than flow-matching consistency training. This positioning suggests the work bridges efficiency concerns with generative quality in a relatively underexplored intersection.

Among 27 candidates examined, none clearly refuted any of the three core contributions: Few-Step Discrete Flow-Matching (10 candidates), Step-Aware Training with Shortcut Teacher (10 candidates), or Cumulative Scalar Update Rule (7 candidates). This limited search scope—focused on top-K semantic matches—indicates that within the examined subset, the specific combination of step-aware consistency training, teacher-guided distillation, and the proposed update rule appears novel. However, the analysis does not claim exhaustive coverage; broader literature may contain related techniques not captured in these 27 candidates.

Based on the limited search and sparse taxonomy leaf, the work appears to occupy a distinct position within few-step discrete flow methods for text. The lack of refutable pairs among examined candidates and the small sibling set (one other paper in the same leaf) suggest meaningful differentiation from prior approaches. Nonetheless, the modest candidate pool (27 papers) and the field's early stage mean this assessment reflects current search boundaries rather than definitive novelty across all possible related work.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Few-step discrete flow-matching for long text generation. The field centers on developing efficient generative models that can produce high-quality text through discrete flow processes, often aiming to reduce the number of sampling steps required. The taxonomy reveals four main branches: one focused specifically on discrete flow matching for text generation, another exploring discrete variational and latent-guided approaches, a third extending discrete flow methods to non-text modalities such as proteins or images, and a fourth examining multi-step reasoning frameworks using generative flow networks. Within the text generation branch, works like Flow Matching Text[2] establish foundational techniques for applying flow-matching principles to discrete token spaces, while a small cluster of papers explores few-step acceleration strategies to make these methods practical for long-form content. The non-text modalities branch includes efforts like Inverse Protein Folding[3], demonstrating that discrete flow ideas generalize beyond language, and the variational branch (e.g., DiscoDVT[4]) investigates latent representations to guide generation. A particularly active line of work concerns balancing generation quality with computational efficiency, especially for lengthy outputs where autoregressive decoding becomes prohibitively slow. Fast Diffusion[0] sits squarely within the few-step accelerated text generation cluster, emphasizing rapid sampling without sacrificing coherence over extended sequences. This contrasts with Flow Matching Text[2], which provides a more general framework but does not specifically optimize for minimal steps, and with Edit Flows[5], which explores iterative refinement strategies that may require more iterations. Meanwhile, StreamFlow[6] investigates streaming or incremental generation patterns, offering a different angle on efficiency. The central tension across these branches is whether to prioritize step reduction, model expressiveness, or adaptability to diverse modalities, with Fast Diffusion[0] contributing a concrete solution for the few-step regime in long text scenarios.

Claimed Contributions

Few-Step Discrete Flow-Matching (FS-DFM)

10 retrieved papers

The authors propose FS-DFM, a diffusion language model that achieves high-quality text generation in very few sampling steps (e.g., 8 steps) by making the number of steps an explicit training parameter and enforcing consistency across step budgets, enabling up to 128× faster sampling than standard discrete-flow baselines.

10 retrieved papers

Step-Aware Discrete Flow-Matching with Shortcut Teacher

10 retrieved papers

The authors introduce a step-aware training approach that conditions the model on the intended step size and uses a shortcut teacher (implemented via Runge–Kutta ODE solvers) to distill long-run trajectories, ensuring that a single large step approximates the cumulative effect of many small updates.

10 retrieved papers

Cumulative Scalar Update Rule

7 retrieved papers

The authors develop a cumulative scalar formulation that integrates the scheduler over each finite step interval, replacing the instantaneous scale with a closed-form expression calibrated to both current time and step budget, enabling effective probability flow even in early steps of few-step sampling.

7 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[2] Flow Matching for Conditional Text Generation in a Few Sampling Steps PDF

Vincent Hu, Di Wu, Yuki Asano, Pascal Mettes, Basura Fernando, BjÃ¶rn Ommer, Cees Snoek (2024)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Few-Step Discrete Flow-Matching (FS-DFM)

[2] Flow Matching for Conditional Text Generation in a Few Sampling Steps PDF

Cannot Refute

[17] Discrete flow matching PDF

Cannot Refute

[18] Dirichlet flow matching with applications to dna sequence design PDF

Cannot Refute

[19] FlashAudio: Rectified Flow for Fast and High-Fidelity Text-to-Audio Generation PDF

Cannot Refute

[20] Simplespeech 2: Towards simple and efficient text-to-speech with flow-based scalar latent transformer diffusion models PDF

Cannot Refute

[21] Glow-tts: A generative flow for text-to-speech via monotonic alignment search PDF

Cannot Refute

[22] Flowdreamer: Exploring high fidelity text-to-3d generation via rectified flow PDF

Cannot Refute

[23] Flow matching with general discrete paths: A kinetic-optimal perspective PDF

Cannot Refute

[24] Language rectified flow: Advancing diffusion language generation with probabilistic flows PDF

Cannot Refute

[25] Bayesian Flow Networks PDF

Cannot Refute

Contribution

Step-Aware Discrete Flow-Matching with Shortcut Teacher

[7] d-TreeRPO: Towards More Reliable Policy Optimization for Diffusion Language Models PDF

Cannot Refute

[8] Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes PDF

Cannot Refute

[9] Progressive Distillation for Fast Sampling of Diffusion Models PDF

Cannot Refute

[10] One-Step Diffusion Distillation via Deep Equilibrium Models PDF

Cannot Refute

[11] DLM-One: Diffusion Language Models for One-Step Sequence Generation PDF

Cannot Refute

[12] SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation PDF

Cannot Refute

[13] Distilling ODE Solvers of Diffusion Models into Smaller Steps PDF

Cannot Refute

[14] Beyond Autoregression: Fast LLMs via Self-Distillation Through Time PDF

Cannot Refute

[15] Learnable sampler distillation for discrete diffusion models PDF

Cannot Refute

[16] Simple Distillation for One-Step Diffusion Models PDF

Cannot Refute

Contribution

Cumulative Scalar Update Rule

[26] Exploiting diffusion prior for real-world image super-resolution PDF

Cannot Refute

[27] A survey on diffusion models for time series and spatio-temporal data PDF

Cannot Refute

[28] Neural flow diffusion models: Learnable forward process for improved diffusion modelling PDF

Cannot Refute

[29] Imputation-based Time-Series Anomaly Detection with Conditional Weight-Incremental Diffusion Models PDF

Cannot Refute

[30] Deep Parameter Interpolation for Scalar Conditioning PDF

Cannot Refute

[31] Drax: Speech Recognition with Discrete Flow Matching PDF

Cannot Refute

[32] Fourier Diffusion Models: A Method to Control MTF and NPS in Score-Based Stochastic Image Generation PDF

Cannot Refute

FS-DFM: Fast and Accurate Long Text Generation with Few-Step Diffusion Language Models

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[2] Flow Matching for Conditional Text Generation in a Few Sampling Steps PDF

Contribution Analysis

Few-Step Discrete Flow-Matching (FS-DFM)

[2] Flow Matching for Conditional Text Generation in a Few Sampling Steps PDF

[17] Discrete flow matching PDF

[18] Dirichlet flow matching with applications to dna sequence design PDF

[19] FlashAudio: Rectified Flow for Fast and High-Fidelity Text-to-Audio Generation PDF

[20] Simplespeech 2: Towards simple and efficient text-to-speech with flow-based scalar latent transformer diffusion models PDF

[21] Glow-tts: A generative flow for text-to-speech via monotonic alignment search PDF

[22] Flowdreamer: Exploring high fidelity text-to-3d generation via rectified flow PDF

[23] Flow matching with general discrete paths: A kinetic-optimal perspective PDF

[24] Language rectified flow: Advancing diffusion language generation with probabilistic flows PDF

[25] Bayesian Flow Networks PDF

Step-Aware Discrete Flow-Matching with Shortcut Teacher

[7] d-TreeRPO: Towards More Reliable Policy Optimization for Diffusion Language Models PDF

[8] Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes PDF

[9] Progressive Distillation for Fast Sampling of Diffusion Models PDF

[10] One-Step Diffusion Distillation via Deep Equilibrium Models PDF

[11] DLM-One: Diffusion Language Models for One-Step Sequence Generation PDF

[12] SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation PDF

[13] Distilling ODE Solvers of Diffusion Models into Smaller Steps PDF

[14] Beyond Autoregression: Fast LLMs via Self-Distillation Through Time PDF

[15] Learnable sampler distillation for discrete diffusion models PDF

[16] Simple Distillation for One-Step Diffusion Models PDF

Cumulative Scalar Update Rule

[26] Exploiting diffusion prior for real-world image super-resolution PDF

[27] A survey on diffusion models for time series and spatio-temporal data PDF

[28] Neural flow diffusion models: Learnable forward process for improved diffusion modelling PDF

[29] Imputation-based Time-Series Anomaly Detection with Conditional Weight-Incremental Diffusion Models PDF

[30] Deep Parameter Interpolation for Scalar Conditioning PDF

[31] Drax: Speech Recognition with Discrete Flow Matching PDF

[32] Fourier Diffusion Models: A Method to Control MTF and NPS in Score-Based Stochastic Image Generation PDF

Table of Contents