Any-Order Flexible Length Masked Diffusion

ICLR 2026 Conference SubmissionAnonymous Authors
Diffusion ModelGenerative ModelDiscrete DiffusionStochastic Interpolant
Abstract:

Masked diffusion models (MDMs) have recently emerged as a promising alternative to autoregressive models over discrete domains. MDMs generate sequences in an any-order, parallel fashion, enabling fast inference and strong performance on non-causal tasks. However, a crucial limitation is that they do not support token insertions and are thus limited to fixed-length generations. To this end, we introduce Flexible Masked Diffusion Models (FlexMDMs), a discrete diffusion paradigm that simultaneously can model sequences of flexible length while provably retaining MDMs' flexibility of any-order inference. Grounded in an extension of the stochastic interpolant framework, FlexMDMs generate sequences by inserting mask tokens and unmasking them. Empirically, we show that FlexMDMs match MDMs in perplexity while modeling length statistics with much higher fidelity. On a synthetic maze planning task, they achieve \approx 60% higher success rate than MDM baselines. Finally, we show pretrained MDMs can easily be retrofitted into FlexMDMs: on 16 H100s, it takes only three days to fine-tune LLaDA-8B into a FlexMDM, achieving superior performance on math (GSM8K, 58%\to67%) and code infilling performance (52%\to65%).

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Taxonomy

Core-task Taxonomy Papers
39
3
Claimed Contributions
25
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: variable-length sequence generation with masked diffusion models. The field has coalesced around several complementary directions. At the foundation, works such as Simple Masked Diffusion[1] and Simplified Masked Diffusion[4] establish core architectures and training procedures for discrete masked diffusion, while Diffusion Forcing[5] explores unified frameworks that bridge autoregressive and diffusion paradigms. A second major branch addresses variable-length generation and flexible decoding, where methods like Insertion Language Models[3] and Variable Length Diffusion LLM[11] tackle adaptive length prediction and non-autoregressive ordering. Inference optimization forms another active area, with Parallel Masked Diffusion Sampling[9] and Self-Speculative Masked Diffusions[23] accelerating sampling through speculative or parallel strategies. Finally, diverse application domains—from motion synthesis (Generative Masked Motion[2], Length-Aware Motion Synthesis[12]) to specialized modalities like protein design (RoseTTAFold Sequence Diffusion[34]) and recommendation systems (Masked Diffusion Recommendation[26])—demonstrate the breadth of masked diffusion's reach. Within the variable-length generation branch, a central challenge is balancing flexibility in output length with coherent structure and efficient sampling. Works like Variable Length Diffusion LLM[11] and Variable-Length Denoising[17] explore explicit length conditioning and adaptive stopping criteria, while Insertion Language Models[3] investigates order-agnostic generation that can insert tokens at arbitrary positions. Any-Order Flexible Masked Diffusion[0] sits naturally in this cluster, emphasizing flexible decoding orders that allow dynamic control over sequence construction without rigid left-to-right constraints. Compared to Variable Length Diffusion LLM[11], which focuses on length prediction mechanisms, and Variable-Length Denoising[17], which addresses denoising schedules for varying lengths, the original work appears to prioritize the ordering flexibility itself—enabling any-order masking patterns that adapt to task-specific needs. This positioning reflects ongoing debates about whether length control, decoding order, or sampling efficiency should take precedence in making masked diffusion practical for real-world variable-length tasks.

Claimed Contributions

Flexible Masked Diffusion Models (FlexMDMs)

The authors propose FlexMDMs, a new discrete diffusion framework that extends masked diffusion models to handle variable-length sequences through token insertion and unmasking operations, while preserving the any-order generation capability of standard MDMs. This addresses a key limitation of existing MDMs which are restricted to fixed-length generations.

10 retrieved papers
Joint interpolant framework for variable-length modeling

The authors develop a theoretical foundation based on extending the stochastic interpolant framework to discrete spaces. They introduce a joint interpolant that augments the process with an auxiliary variable tracking token positions, enabling closed-form rate matrix characterization for variable-length sequence generation.

5 retrieved papers
Efficient retrofitting of pretrained MDMs into FlexMDMs

The authors demonstrate that existing pretrained masked diffusion models can be efficiently converted into FlexMDMs through minimal architectural modifications and fine-tuning. This enables scaling to 8B parameters with only three days of training on 16 H100 GPUs, achieving substantial performance improvements on downstream tasks.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Flexible Masked Diffusion Models (FlexMDMs)

The authors propose FlexMDMs, a new discrete diffusion framework that extends masked diffusion models to handle variable-length sequences through token insertion and unmasking operations, while preserving the any-order generation capability of standard MDMs. This addresses a key limitation of existing MDMs which are restricted to fixed-length generations.

Contribution

Joint interpolant framework for variable-length modeling

The authors develop a theoretical foundation based on extending the stochastic interpolant framework to discrete spaces. They introduce a joint interpolant that augments the process with an auxiliary variable tracking token positions, enabling closed-form rate matrix characterization for variable-length sequence generation.

Contribution

Efficient retrofitting of pretrained MDMs into FlexMDMs

The authors demonstrate that existing pretrained masked diffusion models can be efficiently converted into FlexMDMs through minimal architectural modifications and fine-tuning. This enables scaling to 8B parameters with only three days of training on 16 H100 GPUs, achieving substantial performance improvements on downstream tasks.