FS-DFM: Fast and Accurate Long Text Generation with Few-Step Diffusion Language Models

ICLR 2026 Conference SubmissionAnonymous Authors
Diffusion language modelfew step generationflow matching
Abstract:

Autoregressive language models (ARMs) deliver strong likelihoods, but are inherently serial: they generate one token per forward pass, which limits throughput and inflates latency for long sequences. Diffusion Language Models (DLMs) parallelize across positions and thus appear promising for language generation, yet standard discrete diffusion typically needs hundreds to thousands of model evaluations to reach high quality, trading serial depth for iterative breadth. We introduce FS-DFM, Few-Step Discrete Flow-Matching. A discrete flow-matching model designed for speed without sacrificing quality. The core idea is simple: make the number of sampling steps an explicit parameter and train the model to be consistent across step budgets, so one big move lands where many small moves would. We pair this with a reliable update rule that moves probability in the right direction without overshooting, and with strong teacher guidance distilled from long-run trajectories. Together, these choices make few-step sampling stable, accurate, and easy to control. On language modeling benchmarks, FS-DFM with 8 sampling steps achieves perplexity parity with a 1,024-step discrete-flow baseline for generating 1,024 tokens using a similar-size model, delivering up to 128× faster sampling and corresponding latency/throughput gains.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces FS-DFM, a discrete flow-matching framework optimized for generating long text sequences in very few sampling steps (e.g., 8 steps achieving parity with 1,024-step baselines). It resides in the 'Few-Step Accelerated Text Generation' leaf, which contains only two papers total, including this one. This indicates a sparse research direction within the broader discrete flow-matching landscape. The taxonomy shows six papers across all branches, suggesting the field itself is relatively nascent, with few-step acceleration representing a focused but under-explored niche.

The taxonomy tree reveals that discrete flow-matching for text generation branches into few-step acceleration and variable-length sequence handling. Neighboring categories include discrete variational methods (e.g., discourse-aware latent variable models) and extensions to non-text modalities like protein design and streaming audio. The scope notes clarify that FS-DFM's emphasis on step-budget optimization distinguishes it from standard multi-step approaches and from latent-guided methods that rely on variational frameworks rather than flow-matching consistency training. This positioning suggests the work bridges efficiency concerns with generative quality in a relatively underexplored intersection.

Among 27 candidates examined, none clearly refuted any of the three core contributions: Few-Step Discrete Flow-Matching (10 candidates), Step-Aware Training with Shortcut Teacher (10 candidates), or Cumulative Scalar Update Rule (7 candidates). This limited search scope—focused on top-K semantic matches—indicates that within the examined subset, the specific combination of step-aware consistency training, teacher-guided distillation, and the proposed update rule appears novel. However, the analysis does not claim exhaustive coverage; broader literature may contain related techniques not captured in these 27 candidates.

Based on the limited search and sparse taxonomy leaf, the work appears to occupy a distinct position within few-step discrete flow methods for text. The lack of refutable pairs among examined candidates and the small sibling set (one other paper in the same leaf) suggest meaningful differentiation from prior approaches. Nonetheless, the modest candidate pool (27 papers) and the field's early stage mean this assessment reflects current search boundaries rather than definitive novelty across all possible related work.

Taxonomy

Core-task Taxonomy Papers
6
3
Claimed Contributions
27
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: Few-step discrete flow-matching for long text generation. The field centers on developing efficient generative models that can produce high-quality text through discrete flow processes, often aiming to reduce the number of sampling steps required. The taxonomy reveals four main branches: one focused specifically on discrete flow matching for text generation, another exploring discrete variational and latent-guided approaches, a third extending discrete flow methods to non-text modalities such as proteins or images, and a fourth examining multi-step reasoning frameworks using generative flow networks. Within the text generation branch, works like Flow Matching Text[2] establish foundational techniques for applying flow-matching principles to discrete token spaces, while a small cluster of papers explores few-step acceleration strategies to make these methods practical for long-form content. The non-text modalities branch includes efforts like Inverse Protein Folding[3], demonstrating that discrete flow ideas generalize beyond language, and the variational branch (e.g., DiscoDVT[4]) investigates latent representations to guide generation. A particularly active line of work concerns balancing generation quality with computational efficiency, especially for lengthy outputs where autoregressive decoding becomes prohibitively slow. Fast Diffusion[0] sits squarely within the few-step accelerated text generation cluster, emphasizing rapid sampling without sacrificing coherence over extended sequences. This contrasts with Flow Matching Text[2], which provides a more general framework but does not specifically optimize for minimal steps, and with Edit Flows[5], which explores iterative refinement strategies that may require more iterations. Meanwhile, StreamFlow[6] investigates streaming or incremental generation patterns, offering a different angle on efficiency. The central tension across these branches is whether to prioritize step reduction, model expressiveness, or adaptability to diverse modalities, with Fast Diffusion[0] contributing a concrete solution for the few-step regime in long text scenarios.

Claimed Contributions

Few-Step Discrete Flow-Matching (FS-DFM)

The authors propose FS-DFM, a diffusion language model that achieves high-quality text generation in very few sampling steps (e.g., 8 steps) by making the number of steps an explicit training parameter and enforcing consistency across step budgets, enabling up to 128× faster sampling than standard discrete-flow baselines.

10 retrieved papers
Step-Aware Discrete Flow-Matching with Shortcut Teacher

The authors introduce a step-aware training approach that conditions the model on the intended step size and uses a shortcut teacher (implemented via Runge–Kutta ODE solvers) to distill long-run trajectories, ensuring that a single large step approximates the cumulative effect of many small updates.

10 retrieved papers
Cumulative Scalar Update Rule

The authors develop a cumulative scalar formulation that integrates the scheduler over each finite step interval, replacing the instantaneous scale with a closed-form expression calibrated to both current time and step budget, enabling effective probability flow even in early steps of few-step sampling.

7 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Few-Step Discrete Flow-Matching (FS-DFM)

The authors propose FS-DFM, a diffusion language model that achieves high-quality text generation in very few sampling steps (e.g., 8 steps) by making the number of steps an explicit training parameter and enforcing consistency across step budgets, enabling up to 128× faster sampling than standard discrete-flow baselines.

Contribution

Step-Aware Discrete Flow-Matching with Shortcut Teacher

The authors introduce a step-aware training approach that conditions the model on the intended step size and uses a shortcut teacher (implemented via Runge–Kutta ODE solvers) to distill long-run trajectories, ensuring that a single large step approximates the cumulative effect of many small updates.

Contribution

Cumulative Scalar Update Rule

The authors develop a cumulative scalar formulation that integrates the scheduler over each finite step interval, replacing the instantaneous scale with a closed-form expression calibrated to both current time and step budget, enabling effective probability flow even in early steps of few-step sampling.