The Diffusion Duality, Chapter II: Ψ\Psi-Samplers and Efficient Curriculum

ICLR 2026 Conference SubmissionAnonymous Authors
diffusion language modelsdiffusion modelslarge language modelsinference-time scalingpredictor-corrector samplingefficient taining
Abstract:

Uniform-state discrete diffusion models excel at few-step generation and guidance due to their inherent ability to self-correct, making them more preferable than autoregressive or masked diffusion models in these settings. Yet, their sampling efficiency has been limited by reliance on standard posterior samplers, which plateau in quality as steps increase. In this work, we introduce a novel family of Predictor–Corrector (PC) samplers for discrete diffusion models that generalize prior methods and apply to arbitrary noise processes. When paired with uniform-state diffusion, our samplers significantly outperform ancestral sampling on both language and vision tasks: achieving lower generative perplexity at matched unigram entropy on OpenWebText and better FID/IS scores on CIFAR10. Crucially, unlike conventional samplers, our PC methods continue to improve generation quality with more sampling steps, narrowing the gap with Masked diffusion. Beyond sampling, we develop a fast and memory-efficient curriculum for Duo++^{++}'s (our method) Gaussian relaxation phase, which avoids materializing large Gaussian-diffused one-hot vectors. This reduces training time by 25% compared to Duo while maintaining similar validation perplexity on OpenWebText and LM1B and strong downstream performance.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces a family of predictor-corrector samplers for uniform-state discrete diffusion models, aiming to improve sampling efficiency beyond standard ancestral methods. It resides in the 'Discrete Predictor-Corrector Frameworks' leaf, which contains only two papers including this one. This sparse population suggests the research direction is relatively nascent, with limited prior work directly addressing predictor-corrector strategies for discrete diffusion. The taxonomy reveals that discrete diffusion sampling remains less explored than its continuous counterpart, where multiple leaves contain diverse acceleration and correction techniques.

The taxonomy tree shows that neighboring leaves include 'Informed Correction Strategies' (model-guided corrections) and 'Discrete Diffusion for Image Synthesis' (application-specific methods). The broader 'Predictor-Corrector Sampling Methods for Discrete Diffusion' branch contains only four leaves total, contrasting sharply with the 'Continuous Diffusion' branch's richer structure of unified frameworks, fast ODE solvers, and training-free acceleration methods. This structural asymmetry indicates that discrete diffusion predictor-corrector methods occupy a less mature research area, with fewer established paradigms and application domains compared to continuous diffusion sampling.

Among 28 candidates examined, the analysis identified potential overlaps for all three contributions. The Ψ-posteriors contribution examined 8 candidates with 1 refutable match, suggesting some prior work on non-Markovian posteriors exists within this limited search scope. The Ψ-samplers contribution examined 10 candidates with 2 refutable matches, indicating more substantial prior exploration of predictor-corrector sampling strategies. The curriculum learning contribution also examined 10 candidates with 1 refutable match. These statistics reflect a constrained literature search rather than exhaustive coverage, meaning additional relevant work may exist beyond the top-30 semantic matches analyzed.

Given the sparse taxonomy leaf and limited search scope, the work appears to address an under-explored niche within discrete diffusion sampling. The presence of refutable candidates across all contributions suggests incremental advancement over existing methods rather than entirely novel territory. However, the small scale of the literature search (28 candidates) and the nascent state of the discrete predictor-corrector subfield leave open the possibility that the work's novelty is more substantial than these signals alone indicate. A broader search would clarify whether the observed overlaps represent fundamental limitations or merely reflect the most semantically similar prior work.

Taxonomy

Core-task Taxonomy Papers
19
3
Claimed Contributions
28
Contribution Candidate Papers Compared
4
Refutable Paper

Research Landscape Overview

Core task: Predictor-corrector sampling for discrete diffusion models. The field organizes around four main branches that reflect both methodological and application-oriented perspectives. The first branch focuses on predictor-corrector frameworks specifically tailored to discrete state spaces, addressing the unique challenges of categorical or token-based generation where continuous interpolation is unavailable. A second branch examines predictor-corrector methods in continuous diffusion settings, encompassing works like UniPC[5], DPM Solver v3[6], and ERA Solver[7] that refine sampling trajectories through corrector steps in image and signal domains. Theoretical foundations form a third branch, exploring convergence guarantees and optimality conditions for these iterative schemes, as seen in Prediction Correction Convergence[11] and Score Optimal Schedules[3]. Finally, application domains span diverse areas from molecular generation (Constrained Molecular Generation[10]) to photoacoustic imaging (Photoacoustic Score Diffusion[8]) and wireless communications (Massive MIMO Diffusion[12]), demonstrating the broad utility of predictor-corrector strategies. Within the discrete predictor-corrector frameworks, a small handful of works have pioneered the adaptation of continuous-domain ideas to categorical spaces. Predictor Corrector Discrete Diffusion[1] laid early groundwork for iterative refinement in discrete settings, while Informed Correctors[2] introduced mechanisms to leverage domain-specific structure during correction steps. Diffusion Duality Chapter II[0] situates itself closely within this discrete framework cluster, emphasizing the interplay between forward and reverse processes in token-based generation. Compared to Predictor Corrector Discrete Diffusion[1], which established foundational sampling mechanics, the original work appears to delve deeper into duality principles that govern predictor and corrector interactions. Meanwhile, Informed Correctors[2] explores how external knowledge can guide corrections, highlighting a complementary direction where structural priors enhance sampling efficiency. These contrasting emphases—foundational mechanics, duality theory, and informed guidance—illustrate the evolving landscape of discrete predictor-corrector methods and the open questions surrounding optimal correction strategies in non-continuous spaces.

Claimed Contributions

Ψ-posteriors: a family of non-Markovian posteriors for discrete diffusion with arbitrary noise priors

The authors introduce Ψ-posteriors, which are superposition posteriors that linearly combine the forward process and reverse posteriors of discrete diffusion models. These posteriors maintain the same marginals as standard Markovian diffusion processes while enabling predictor-corrector sampling capabilities for arbitrary noise distributions, generalizing prior methods to both masked and uniform-state diffusion.

8 retrieved papers
Can Refute
Ψ-samplers: predictor-corrector samplers that improve generation quality with more sampling steps

The authors develop Ψ-samplers derived from Ψ-posteriors that enable error correction during generation by allowing tokens to be revised. Unlike conventional ancestral samplers that plateau in quality, these samplers continue to improve generation quality as the number of sampling steps increases, closing the performance gap with masked diffusion models in high-step regimes.

10 retrieved papers
Can Refute
Fast and memory-efficient curriculum learning strategy for uniform-state diffusion

The authors propose an efficient curriculum learning approach that avoids materializing large Gaussian-diffused one-hot vectors by simulating only the top-k entries using order statistics and approximating the normalization constant. This reformulation maintains similar validation perplexity and downstream performance while substantially reducing computational costs compared to the original curriculum method.

10 retrieved papers
Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Ψ-posteriors: a family of non-Markovian posteriors for discrete diffusion with arbitrary noise priors

The authors introduce Ψ-posteriors, which are superposition posteriors that linearly combine the forward process and reverse posteriors of discrete diffusion models. These posteriors maintain the same marginals as standard Markovian diffusion processes while enabling predictor-corrector sampling capabilities for arbitrary noise distributions, generalizing prior methods to both masked and uniform-state diffusion.

Contribution

Ψ-samplers: predictor-corrector samplers that improve generation quality with more sampling steps

The authors develop Ψ-samplers derived from Ψ-posteriors that enable error correction during generation by allowing tokens to be revised. Unlike conventional ancestral samplers that plateau in quality, these samplers continue to improve generation quality as the number of sampling steps increases, closing the performance gap with masked diffusion models in high-step regimes.

Contribution

Fast and memory-efficient curriculum learning strategy for uniform-state diffusion

The authors propose an efficient curriculum learning approach that avoids materializing large Gaussian-diffused one-hot vectors by simulating only the top-k entries using order statistics and approximating the normalization constant. This reformulation maintains similar validation perplexity and downstream performance while substantially reducing computational costs compared to the original curriculum method.

The Diffusion Duality, Chapter II: $\Psi$-Samplers and Efficient Curriculum | Novelty Validation