Any-Order Flexible Length Masked Diffusion

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.5 Download Report PDF

Diffusion ModelGenerative ModelDiscrete DiffusionStochastic Interpolant

Masked diffusion models (MDMs) have recently emerged as a promising alternative to autoregressive models over discrete domains. MDMs generate sequences in an any-order, parallel fashion, enabling fast inference and strong performance on non-causal tasks. However, a crucial limitation is that they do not support token insertions and are thus limited to fixed-length generations. To this end, we introduce Flexible Masked Diffusion Models (FlexMDMs), a discrete diffusion paradigm that simultaneously can model sequences of flexible length while provably retaining MDMs' flexibility of any-order inference. Grounded in an extension of the stochastic interpolant framework, FlexMDMs generate sequences by inserting mask tokens and unmasking them. Empirically, we show that FlexMDMs match MDMs in perplexity while modeling length statistics with much higher fidelity. On a synthetic maze planning task, they achieve $\approx$ 60% higher success rate than MDM baselines. Finally, we show pretrained MDMs can easily be retrofitted into FlexMDMs: on 16 H100s, it takes only three days to fine-tune LLaDA-8B into a FlexMDM, achieving superior performance on math (GSM8K, 58% $\to$ 67%) and code infilling performance (52% $\to$ 65%).

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: variable-length sequence generation with masked diffusion models. The field has coalesced around several complementary directions. At the foundation, works such as Simple Masked Diffusion[1] and Simplified Masked Diffusion[4] establish core architectures and training procedures for discrete masked diffusion, while Diffusion Forcing[5] explores unified frameworks that bridge autoregressive and diffusion paradigms. A second major branch addresses variable-length generation and flexible decoding, where methods like Insertion Language Models[3] and Variable Length Diffusion LLM[11] tackle adaptive length prediction and non-autoregressive ordering. Inference optimization forms another active area, with Parallel Masked Diffusion Sampling[9] and Self-Speculative Masked Diffusions[23] accelerating sampling through speculative or parallel strategies. Finally, diverse application domains—from motion synthesis (Generative Masked Motion[2], Length-Aware Motion Synthesis[12]) to specialized modalities like protein design (RoseTTAFold Sequence Diffusion[34]) and recommendation systems (Masked Diffusion Recommendation[26])—demonstrate the breadth of masked diffusion's reach. Within the variable-length generation branch, a central challenge is balancing flexibility in output length with coherent structure and efficient sampling. Works like Variable Length Diffusion LLM[11] and Variable-Length Denoising[17] explore explicit length conditioning and adaptive stopping criteria, while Insertion Language Models[3] investigates order-agnostic generation that can insert tokens at arbitrary positions. Any-Order Flexible Masked Diffusion[0] sits naturally in this cluster, emphasizing flexible decoding orders that allow dynamic control over sequence construction without rigid left-to-right constraints. Compared to Variable Length Diffusion LLM[11], which focuses on length prediction mechanisms, and Variable-Length Denoising[17], which addresses denoising schedules for varying lengths, the original work appears to prioritize the ordering flexibility itself—enabling any-order masking patterns that adapt to task-specific needs. This positioning reflects ongoing debates about whether length control, decoding order, or sampling efficiency should take precedence in making masked diffusion practical for real-world variable-length tasks.

Claimed Contributions

Flexible Masked Diffusion Models (FlexMDMs)

10 retrieved papers

The authors propose FlexMDMs, a new discrete diffusion framework that extends masked diffusion models to handle variable-length sequences through token insertion and unmasking operations, while preserving the any-order generation capability of standard MDMs. This addresses a key limitation of existing MDMs which are restricted to fixed-length generations.

10 retrieved papers

Joint interpolant framework for variable-length modeling

5 retrieved papers

The authors develop a theoretical foundation based on extending the stochastic interpolant framework to discrete spaces. They introduce a joint interpolant that augments the process with an auxiliary variable tracking token positions, enabling closed-form rate matrix characterization for variable-length sequence generation.

5 retrieved papers

Efficient retrofitting of pretrained MDMs into FlexMDMs

10 retrieved papers

The authors demonstrate that existing pretrained masked diffusion models can be efficiently converted into FlexMDMs through minimal architectural modifications and fine-tuning. This enables scaling to 8B parameters with only three days of training on 16 H100 GPUs, achieving substantial performance improvements on downstream tasks.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[11] Diffusion llm with native variable generation lengths: Let lead the way PDF

Y Yang, C Wang, S Wang, Z Wen, B Qi, H Xu (2025)

[17] Beyond fixed: Training-free variable-length denoising for diffusion large language models PDF

Li Jinsong, Dong, Xiaoyi, Jinsong Li, Zang, Yuhang, Xiao-wen Dong, Cao, Yuhang Zang, Wang, Jiaqi, Yuhang Cao, Lin, Dahua, Jiaqi Wang, Dahua Lin (2025)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Flexible Masked Diffusion Models (FlexMDMs)

[7] Sequential diffusion language models PDF

Cannot Refute

[40] Block diffusion: Interpolating between autoregressive and diffusion language models PDF

Cannot Refute

[41] Diffsound: Discrete diffusion model for text-to-sound generation PDF

Cannot Refute

[42] Diffuseq: Sequence to sequence text generation with diffusion models PDF

Cannot Refute

[43] Discrete diffusion models for language generation PDF

Cannot Refute

[44] DiffRhythm: Blazingly fast and embarrassingly simple end-to-end full-length song generation with latent diffusion PDF

Cannot Refute

[45] dkv-cache: The cache for diffusion language models PDF

Cannot Refute

[46] DiffListener: Discrete Diffusion Model for Listener Generation PDF

Cannot Refute

[47] Non-autoregressive diffusion-based temporal point processes for continuous-time long-term event prediction PDF

Cannot Refute

[48] Vector quantized diffusion model with codeunet for text-to-sign pose sequences generation PDF

Cannot Refute

Contribution

Joint interpolant framework for variable-length modeling

[59] Elimination of systematic error in digital image correlation caused by intensity interpolation by introducing position randomness to subset points PDF

Cannot Refute

[60] Quantitative properties of sovereign default models: solution methods matter PDF

Cannot Refute

[61] Stochastic signal processing in adaptive measurement systems with rough space-time statistics: Method of invertible spectral analysis PDF

Cannot Refute

[62] Atmospheric water vapor maps generations from stochastic interpolation of GNSS Zenith Tropospheric Delays PDF

Cannot Refute

[63] Uncertainty Analysis of the Dynamic Response of a Randomly Parametrized Corrugated Skin PDF

Cannot Refute

Contribution

Efficient retrofitting of pretrained MDMs into FlexMDMs

[49] Maskdiffusion: Exploiting pre-trained diffusion models for semantic segmentation PDF

Cannot Refute

[50] Fine-tuning diffusion models with limited data PDF

Cannot Refute

[51] Beyond masked and unmasked: Discrete diffusion models via partial masking PDF

Cannot Refute

[52] Difffit: Unlocking transferability of large diffusion models via simple parameter-efficient fine-tuning PDF

Cannot Refute

[53] Steering masked discrete diffusion models via discrete denoising posterior prediction PDF

Cannot Refute

[54] Not All Parameters Matter: Masking Diffusion Models for Enhancing Generation Ability PDF

Cannot Refute

[55] More control for free! image synthesis with semantic diffusion guidance PDF

Cannot Refute

[56] Unified auto-encoding with masked diffusion PDF

Cannot Refute

[57] Plan for Speed--Dilated Scheduling for Masked Diffusion Language Models PDF

Cannot Refute

[58] Cross-view Masked Diffusion Transformers for Person Image Synthesis PDF

Cannot Refute

Any-Order Flexible Length Masked Diffusion

Overview

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[11] Diffusion llm with native variable generation lengths: Let lead the way PDF

[17] Beyond fixed: Training-free variable-length denoising for diffusion large language models PDF

Contribution Analysis

Flexible Masked Diffusion Models (FlexMDMs)

[7] Sequential diffusion language models PDF

[40] Block diffusion: Interpolating between autoregressive and diffusion language models PDF

[41] Diffsound: Discrete diffusion model for text-to-sound generation PDF

[42] Diffuseq: Sequence to sequence text generation with diffusion models PDF

[43] Discrete diffusion models for language generation PDF

[44] DiffRhythm: Blazingly fast and embarrassingly simple end-to-end full-length song generation with latent diffusion PDF

[45] dkv-cache: The cache for diffusion language models PDF

[46] DiffListener: Discrete Diffusion Model for Listener Generation PDF

[47] Non-autoregressive diffusion-based temporal point processes for continuous-time long-term event prediction PDF

[48] Vector quantized diffusion model with codeunet for text-to-sign pose sequences generation PDF

Joint interpolant framework for variable-length modeling

[59] Elimination of systematic error in digital image correlation caused by intensity interpolation by introducing position randomness to subset points PDF

[60] Quantitative properties of sovereign default models: solution methods matter PDF

[61] Stochastic signal processing in adaptive measurement systems with rough space-time statistics: Method of invertible spectral analysis PDF

[62] Atmospheric water vapor maps generations from stochastic interpolation of GNSS Zenith Tropospheric Delays PDF

[63] Uncertainty Analysis of the Dynamic Response of a Randomly Parametrized Corrugated Skin PDF

Efficient retrofitting of pretrained MDMs into FlexMDMs

[49] Maskdiffusion: Exploiting pre-trained diffusion models for semantic segmentation PDF

[50] Fine-tuning diffusion models with limited data PDF

[51] Beyond masked and unmasked: Discrete diffusion models via partial masking PDF

[52] Difffit: Unlocking transferability of large diffusion models via simple parameter-efficient fine-tuning PDF

[53] Steering masked discrete diffusion models via discrete denoising posterior prediction PDF

[54] Not All Parameters Matter: Masking Diffusion Models for Enhancing Generation Ability PDF

[55] More control for free! image synthesis with semantic diffusion guidance PDF

[56] Unified auto-encoding with masked diffusion PDF

[57] Plan for Speed--Dilated Scheduling for Masked Diffusion Language Models PDF

[58] Cross-view Masked Diffusion Transformers for Person Image Synthesis PDF

Table of Contents