Parallel Sampling from Masked Diffusion Models via Conditional Independence Testing

ICLR 2026 Conference SubmissionAnonymous Authors
masked diffusion modelslanguage modelsinference
Abstract:

Masked diffusion models (MDMs) offer a compelling alternative to autoregres- sive models (ARMs) for discrete text generation because they enable parallel token sampling, rather than sequential, left-to-right generation. This means po- tentially much faster inference. However, effective parallel sampling faces two competing requirements: (i) simultaneously updated tokens must be conditionally independent, and (ii) updates should prioritise high-confidence predictions. These goals conflict because high-confidence predictions often cluster and depend on each other, opportunities for parallel updates.

We present PUNT, a model-agnostic sampler that reconciles this trade-off. Our method identifies token dependencies and removes lower-confidence tokens from conflicting groups. This produces sets of indices for unmasking that satisfy both independence and confidence criteria. Our approach ensures improved parallel unmasking through approximate conditional independence testing.

Our experiments show that PUNT delivers a superior trade-off between accuracy and compute when compared to other strong training-free baselines, especially for generation of longer sequences. On the IFEval benchmark, it achieves up to 16% higher accuracy over baseline methods, including sequential generation (one-by- one). These gains hold across different values of hyperparameters, mitigating the need for brittle hyperparameter tuning. Moreover, we observe that PUNT induces an emergent hierarchical generation strategy, where the model first establishes high-level paragraph structure before local refinement, suggesting a planning-like generation process that contributes to strong alignment performance.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces PUNT, a model-agnostic sampler that addresses token dependency conflicts during parallel unmasking in masked diffusion models. It resides in the 'Inference-Time Sampling Policies' leaf of the taxonomy, which contains only three papers total. This leaf sits within the broader 'Sampling Strategies and Scheduling' branch, indicating a relatively sparse research direction focused on inference-only methods that do not require training modifications. The small sibling count suggests this specific problem space—balancing conditional independence and confidence during parallel sampling—has received limited prior attention compared to other branches like application-specific architectures or core model formulations.

The taxonomy reveals neighboring work in 'Training-Aware Sampling Integration' (two papers on learned unmasking policies) and 'Speculative and Multi-Token Decoding' (two papers on multi-token prediction). PUNT diverges from training-aware methods by operating purely at inference time, avoiding the need for path-aligned training or learned policies. It also differs from speculative decoding approaches, which typically predict and validate tokens in a draft-verify framework, whereas PUNT explicitly tests for contextual independence to construct safe parallel unmasking sets. The taxonomy's scope notes clarify that PUNT's inference-only nature excludes it from training-integrated methods, while its focus on dependency resolution distinguishes it from single-token scheduling heuristics.

Among the fifteen candidates examined, none clearly refute any of PUNT's three contributions. The first contribution (contextual independence testing for parallel unmasking) examined five candidates with zero refutations; the second (recursive binary encoding algorithm) examined six with zero refutations; the third (contextual independence criterion) examined four with zero refutations. This limited search scope—fifteen papers from semantic retrieval—suggests that within the examined neighborhood, no prior work explicitly combines dependency testing with confidence-based parallel unmasking in this manner. However, the small candidate pool means the analysis cannot rule out relevant work outside the top-K semantic matches or in adjacent research communities.

Given the sparse taxonomy leaf and absence of refutations among fifteen examined candidates, PUNT appears to occupy a relatively unexplored niche within inference-time sampling policies. The analysis is constrained by the limited search scope and does not cover exhaustive citation networks or domain-specific venues. The novelty assessment reflects what is visible within top-K semantic neighbors, not a comprehensive field survey.

Taxonomy

Core-task Taxonomy Papers
28
3
Claimed Contributions
15
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: parallel sampling from masked diffusion models. The field centers on generating discrete or continuous data by iteratively unmasking tokens or patches, enabling faster inference than traditional autoregressive methods. The taxonomy reveals four main branches: Sampling Strategies and Scheduling explores how to choose which tokens to unmask at each step, ranging from fixed cosine schedules (as in MaskGIT[12]) to learned policies (Learning Unmasking Policies[6]) and dilated or optimal schedules (Dilated Scheduling[5], Optimal Inference Schedules[8]); Model Architectures and Formulations examines the underlying probabilistic frameworks, including simplified formulations (Simplified Masked Diffusion[1]), variational objectives (Variational Masked Diffusion[15]), and self-speculative decoding variants (Self-speculative Masked[2]); Theoretical Analysis and Scaling investigates convergence guarantees, scaling laws, and compute-optimal trade-offs (Scaling Masked Diffusion[4], No Compute Left[19]); and Application-Specific Architectures tailors these models to domains such as motion synthesis (Masked Motion Model[3]), medical imaging (Unified Multi-modal MRI[21]), and recommendation systems (Masked Diffusion Recommendation[27]). A particularly active line of work focuses on inference-time sampling policies, where researchers seek to balance generation quality and speed by optimizing unmasking schedules. Some studies propose hand-crafted or theoretically motivated schedules (Dilated Scheduling[5], Optimal Inference Schedules[8]), while others learn adaptive policies from data (Learning Unmasking Policies[6]). Parallel Sampling Conditional[0] sits squarely within this inference-time policy cluster, emphasizing conditional generation scenarios where the unmasking strategy must account for external constraints or guidance. Compared to nearby works like Dilated Scheduling[5], which focuses on deterministic schedule design, and Optimal Inference Schedules[8], which derives schedules from theoretical principles, Parallel Sampling Conditional[0] appears to prioritize flexible, condition-aware sampling that adapts to task-specific requirements. This positions it as a bridge between fixed scheduling heuristics and fully learned policies, addressing practical deployment challenges where conditioning signals vary widely across applications.

Claimed Contributions

PUNT sampler for parallel token unmasking via contextual independence testing

The authors propose PUNT (Parallel Unmasking with Non-influence Tests), a training-free algorithm that identifies sets of contextually independent tokens for parallel unmasking in masked diffusion models. The method uses a divide-and-conquer strategy with O(log m) model calls per step to test for conditional independence, enabling efficient parallel generation while maintaining quality.

5 retrieved papers
Efficient recursive algorithm with binary encoding for independence testing

The authors develop an efficient iterative implementation of their recursive independence testing procedure using binary encoding of token positions. This transforms the recursive algorithm into a parallel procedure that requires only O(log |M|) forward evaluations per denoising step, where M is the set of masked tokens.

6 retrieved papers
Contextual independence criterion for safe parallel unmasking

The authors formalize contextual independence (Definition 3.1 and 3.2) as the theoretical criterion for determining which tokens can be safely unmasked in parallel. Unlike full statistical independence or confidence-based heuristics, this criterion identifies tokens whose conditional distributions remain unchanged given the current context, ensuring parallel sampling matches sequential sampling.

4 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

PUNT sampler for parallel token unmasking via contextual independence testing

The authors propose PUNT (Parallel Unmasking with Non-influence Tests), a training-free algorithm that identifies sets of contextually independent tokens for parallel unmasking in masked diffusion models. The method uses a divide-and-conquer strategy with O(log m) model calls per step to test for conditional independence, enabling efficient parallel generation while maintaining quality.

Contribution

Efficient recursive algorithm with binary encoding for independence testing

The authors develop an efficient iterative implementation of their recursive independence testing procedure using binary encoding of token positions. This transforms the recursive algorithm into a parallel procedure that requires only O(log |M|) forward evaluations per denoising step, where M is the set of masked tokens.

Contribution

Contextual independence criterion for safe parallel unmasking

The authors formalize contextual independence (Definition 3.1 and 3.2) as the theoretical criterion for determining which tokens can be safely unmasked in parallel. Unlike full statistical independence or confidence-based heuristics, this criterion identifies tokens whose conditional distributions remain unchanged given the current context, ensuring parallel sampling matches sequential sampling.