Any-Order Flexible Length Masked Diffusion
Overview
Taxonomy
Research Landscape Overview
Claimed Contributions
The authors propose FlexMDMs, a new discrete diffusion framework that extends masked diffusion models to handle variable-length sequences through token insertion and unmasking operations, while preserving the any-order generation capability of standard MDMs. This addresses a key limitation of existing MDMs which are restricted to fixed-length generations.
The authors develop a theoretical foundation based on extending the stochastic interpolant framework to discrete spaces. They introduce a joint interpolant that augments the process with an auxiliary variable tracking token positions, enabling closed-form rate matrix characterization for variable-length sequence generation.
The authors demonstrate that existing pretrained masked diffusion models can be efficiently converted into FlexMDMs through minimal architectural modifications and fine-tuning. This enables scaling to 8B parameters with only three days of training on 16 H100 GPUs, achieving substantial performance improvements on downstream tasks.
Core Task Comparisons
Comparisons with papers in the same taxonomy category
[11] Diffusion llm with native variable generation lengths: Let lead the way PDF
[17] Beyond fixed: Training-free variable-length denoising for diffusion large language models PDF
Contribution Analysis
Detailed comparisons for each claimed contribution
Flexible Masked Diffusion Models (FlexMDMs)
The authors propose FlexMDMs, a new discrete diffusion framework that extends masked diffusion models to handle variable-length sequences through token insertion and unmasking operations, while preserving the any-order generation capability of standard MDMs. This addresses a key limitation of existing MDMs which are restricted to fixed-length generations.
[7] Sequential diffusion language models PDF
[40] Block diffusion: Interpolating between autoregressive and diffusion language models PDF
[41] Diffsound: Discrete diffusion model for text-to-sound generation PDF
[42] Diffuseq: Sequence to sequence text generation with diffusion models PDF
[43] Discrete diffusion models for language generation PDF
[44] DiffRhythm: Blazingly fast and embarrassingly simple end-to-end full-length song generation with latent diffusion PDF
[45] dkv-cache: The cache for diffusion language models PDF
[46] DiffListener: Discrete Diffusion Model for Listener Generation PDF
[47] Non-autoregressive diffusion-based temporal point processes for continuous-time long-term event prediction PDF
[48] Vector quantized diffusion model with codeunet for text-to-sign pose sequences generation PDF
Joint interpolant framework for variable-length modeling
The authors develop a theoretical foundation based on extending the stochastic interpolant framework to discrete spaces. They introduce a joint interpolant that augments the process with an auxiliary variable tracking token positions, enabling closed-form rate matrix characterization for variable-length sequence generation.
[59] Elimination of systematic error in digital image correlation caused by intensity interpolation by introducing position randomness to subset points PDF
[60] Quantitative properties of sovereign default models: solution methods matter PDF
[61] Stochastic signal processing in adaptive measurement systems with rough space-time statistics: Method of invertible spectral analysis PDF
[62] Atmospheric water vapor maps generations from stochastic interpolation of GNSS Zenith Tropospheric Delays PDF
[63] Uncertainty Analysis of the Dynamic Response of a Randomly Parametrized Corrugated Skin PDF
Efficient retrofitting of pretrained MDMs into FlexMDMs
The authors demonstrate that existing pretrained masked diffusion models can be efficiently converted into FlexMDMs through minimal architectural modifications and fine-tuning. This enables scaling to 8B parameters with only three days of training on 16 H100 GPUs, achieving substantial performance improvements on downstream tasks.