The Diffusion Duality, Chapter II: $\Psi$ -Samplers and Efficient Curriculum

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 7.0 Download Report PDF

diffusion language modelsdiffusion modelslarge language modelsinference-time scalingpredictor-corrector samplingefficient taining

Uniform-state discrete diffusion models excel at few-step generation and guidance due to their inherent ability to self-correct, making them more preferable than autoregressive or masked diffusion models in these settings. Yet, their sampling efficiency has been limited by reliance on standard posterior samplers, which plateau in quality as steps increase. In this work, we introduce a novel family of Predictor–Corrector (PC) samplers for discrete diffusion models that generalize prior methods and apply to arbitrary noise processes. When paired with uniform-state diffusion, our samplers significantly outperform ancestral sampling on both language and vision tasks: achieving lower generative perplexity at matched unigram entropy on OpenWebText and better FID/IS scores on CIFAR10. Crucially, unlike conventional samplers, our PC methods continue to improve generation quality with more sampling steps, narrowing the gap with Masked diffusion. Beyond sampling, we develop a fast and memory-efficient curriculum for Duo $^{++}$ 's (our method) Gaussian relaxation phase, which avoids materializing large Gaussian-diffused one-hot vectors. This reduces training time by 25% compared to Duo while maintaining similar validation perplexity on OpenWebText and LM1B and strong downstream performance.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces a family of predictor-corrector samplers for uniform-state discrete diffusion models, aiming to improve sampling efficiency beyond standard ancestral methods. It resides in the 'Discrete Predictor-Corrector Frameworks' leaf, which contains only two papers including this one. This sparse population suggests the research direction is relatively nascent, with limited prior work directly addressing predictor-corrector strategies for discrete diffusion. The taxonomy reveals that discrete diffusion sampling remains less explored than its continuous counterpart, where multiple leaves contain diverse acceleration and correction techniques.

The taxonomy tree shows that neighboring leaves include 'Informed Correction Strategies' (model-guided corrections) and 'Discrete Diffusion for Image Synthesis' (application-specific methods). The broader 'Predictor-Corrector Sampling Methods for Discrete Diffusion' branch contains only four leaves total, contrasting sharply with the 'Continuous Diffusion' branch's richer structure of unified frameworks, fast ODE solvers, and training-free acceleration methods. This structural asymmetry indicates that discrete diffusion predictor-corrector methods occupy a less mature research area, with fewer established paradigms and application domains compared to continuous diffusion sampling.

Among 28 candidates examined, the analysis identified potential overlaps for all three contributions. The Ψ-posteriors contribution examined 8 candidates with 1 refutable match, suggesting some prior work on non-Markovian posteriors exists within this limited search scope. The Ψ-samplers contribution examined 10 candidates with 2 refutable matches, indicating more substantial prior exploration of predictor-corrector sampling strategies. The curriculum learning contribution also examined 10 candidates with 1 refutable match. These statistics reflect a constrained literature search rather than exhaustive coverage, meaning additional relevant work may exist beyond the top-30 semantic matches analyzed.

Given the sparse taxonomy leaf and limited search scope, the work appears to address an under-explored niche within discrete diffusion sampling. The presence of refutable candidates across all contributions suggests incremental advancement over existing methods rather than entirely novel territory. However, the small scale of the literature search (28 candidates) and the nascent state of the discrete predictor-corrector subfield leave open the possibility that the work's novelty is more substantial than these signals alone indicate. A broader search would clarify whether the observed overlaps represent fundamental limitations or merely reflect the most semantically similar prior work.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Predictor-corrector sampling for discrete diffusion models. The field organizes around four main branches that reflect both methodological and application-oriented perspectives. The first branch focuses on predictor-corrector frameworks specifically tailored to discrete state spaces, addressing the unique challenges of categorical or token-based generation where continuous interpolation is unavailable. A second branch examines predictor-corrector methods in continuous diffusion settings, encompassing works like UniPC[5], DPM Solver v3[6], and ERA Solver[7] that refine sampling trajectories through corrector steps in image and signal domains. Theoretical foundations form a third branch, exploring convergence guarantees and optimality conditions for these iterative schemes, as seen in Prediction Correction Convergence[11] and Score Optimal Schedules[3]. Finally, application domains span diverse areas from molecular generation (Constrained Molecular Generation[10]) to photoacoustic imaging (Photoacoustic Score Diffusion[8]) and wireless communications (Massive MIMO Diffusion[12]), demonstrating the broad utility of predictor-corrector strategies. Within the discrete predictor-corrector frameworks, a small handful of works have pioneered the adaptation of continuous-domain ideas to categorical spaces. Predictor Corrector Discrete Diffusion[1] laid early groundwork for iterative refinement in discrete settings, while Informed Correctors[2] introduced mechanisms to leverage domain-specific structure during correction steps. Diffusion Duality Chapter II[0] situates itself closely within this discrete framework cluster, emphasizing the interplay between forward and reverse processes in token-based generation. Compared to Predictor Corrector Discrete Diffusion[1], which established foundational sampling mechanics, the original work appears to delve deeper into duality principles that govern predictor and corrector interactions. Meanwhile, Informed Correctors[2] explores how external knowledge can guide corrections, highlighting a complementary direction where structural priors enhance sampling efficiency. These contrasting emphases—foundational mechanics, duality theory, and informed guidance—illustrate the evolving landscape of discrete predictor-corrector methods and the open questions surrounding optimal correction strategies in non-continuous spaces.

Claimed Contributions

Ψ-posteriors: a family of non-Markovian posteriors for discrete diffusion with arbitrary noise priors

Can Refute

8 retrieved papers

The authors introduce Ψ-posteriors, which are superposition posteriors that linearly combine the forward process and reverse posteriors of discrete diffusion models. These posteriors maintain the same marginals as standard Markovian diffusion processes while enabling predictor-corrector sampling capabilities for arbitrary noise distributions, generalizing prior methods to both masked and uniform-state diffusion.

8 retrieved papers

Can Refute

Ψ-samplers: predictor-corrector samplers that improve generation quality with more sampling steps

Can Refute

10 retrieved papers

The authors develop Ψ-samplers derived from Ψ-posteriors that enable error correction during generation by allowing tokens to be revised. Unlike conventional ancestral samplers that plateau in quality, these samplers continue to improve generation quality as the number of sampling steps increases, closing the performance gap with masked diffusion models in high-step regimes.

10 retrieved papers

Can Refute

Fast and memory-efficient curriculum learning strategy for uniform-state diffusion

Can Refute

10 retrieved papers

The authors propose an efficient curriculum learning approach that avoids materializing large Gaussian-diffused one-hot vectors by simulating only the top-k entries using order statistics and approximating the normalization constant. This reformulation maintains similar validation perplexity and downstream performance while substantially reducing computational costs compared to the original curriculum method.

10 retrieved papers

Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[1] Predictor-corrector sampling for discrete diffusion models PDF

J Lezama, T Salimans, L Jiang, H Chang, J Ho, I Essa (2023)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Ψ-posteriors: a family of non-Markovian posteriors for discrete diffusion with arbitrary noise priors

[31] Non-Markovian Discrete Diffusion with Causal Language Models PDF

Can Refute

[30] Steering Masked Discrete Diffusion Models via Discrete Denoising Posterior Prediction PDF

Cannot Refute

[32] Fast sampling via discrete non-markov diffusion models with predetermined transition time PDF

Cannot Refute

[33] Non-Markovian route to coherence in heterogeneous diffusive systems PDF

Cannot Refute

[34] Star-shaped denoising diffusion probabilistic models PDF

Cannot Refute

[35] Evolution of Fast Sampling Techniques in Diffusion Models: From DDPM to Modern Accelerated Inference Methods PDF

Cannot Refute

[36] Fast Sampling via Discrete Non-Markov Diffusion Models PDF

Cannot Refute

[37] Analysis of a discrete non-Markovian random walk approximation for the time fractional diffusion equation PDF

Cannot Refute

Contribution

Ψ-samplers: predictor-corrector samplers that improve generation quality with more sampling steps

[1] Predictor-corrector sampling for discrete diffusion models PDF

Can Refute

[2] Informed Correctors for Discrete Diffusion Models PDF

Can Refute

[3] Score-optimal diffusion schedules PDF

Cannot Refute

[5] UniPC: A Unified Predictor-Corrector Framework for Fast Sampling of Diffusion Models PDF

Cannot Refute

[6] DPM-Solver-v3: Improved Diffusion ODE Solver with Empirical Model Statistics PDF

Cannot Refute

[7] ERA-Solver: Error-Robust Adams Solver for Fast Sampling of Diffusion Probabilistic Models PDF

Cannot Refute

[13] DC-Solver: Improving Predictor-Corrector Diffusion Sampler via Dynamic Compensation PDF

Cannot Refute

[38] Predict, refine, synthesize: Self-guiding diffusion models for probabilistic time series forecasting PDF

Cannot Refute

[39] Synthesizing PET images from high-field and ultra-high-field MR images using joint diffusion attention model. PDF

Cannot Refute

[40] Solving inverse problems via diffusion optimal control PDF

Cannot Refute

Contribution

Fast and memory-efficient curriculum learning strategy for uniform-state diffusion

[25] The diffusion duality PDF

Can Refute

[20] Learning diffusion models with flexible representation guidance PDF

Cannot Refute

[21] Curriculum direct preference optimization for diffusion and consistency models PDF

Cannot Refute

[22] DexVLA: Vision-Language Model with Plug-In Diffusion Expert for General Robot Control PDF

Cannot Refute

[23] Curriculum conditioned diffusion for multimodal recommendation PDF

Cannot Refute

[24] Revolutionizing Reinforcement Learning Framework for Diffusion Large Language Models PDF

Cannot Refute

[26] Progressive compression with universally quantized diffusion models PDF

Cannot Refute

[27] Score-based generative diffusion models for social recommendations PDF

Cannot Refute

[28] Towards faster training of diffusion models: An inspiration of a consistency phenomenon PDF

Cannot Refute

[29] Denoising task difficulty-based curriculum for training diffusion models PDF

Cannot Refute

The Diffusion Duality, Chapter II: Ψ\PsiΨ-Samplers and Efficient Curriculum

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[1] Predictor-corrector sampling for discrete diffusion models PDF

Contribution Analysis

Ψ-posteriors: a family of non-Markovian posteriors for discrete diffusion with arbitrary noise priors

[31] Non-Markovian Discrete Diffusion with Causal Language Models PDF

[30] Steering Masked Discrete Diffusion Models via Discrete Denoising Posterior Prediction PDF

[32] Fast sampling via discrete non-markov diffusion models with predetermined transition time PDF

[33] Non-Markovian route to coherence in heterogeneous diffusive systems PDF

[34] Star-shaped denoising diffusion probabilistic models PDF

[35] Evolution of Fast Sampling Techniques in Diffusion Models: From DDPM to Modern Accelerated Inference Methods PDF

[36] Fast Sampling via Discrete Non-Markov Diffusion Models PDF

[37] Analysis of a discrete non-Markovian random walk approximation for the time fractional diffusion equation PDF

Ψ-samplers: predictor-corrector samplers that improve generation quality with more sampling steps

[1] Predictor-corrector sampling for discrete diffusion models PDF

[2] Informed Correctors for Discrete Diffusion Models PDF

[3] Score-optimal diffusion schedules PDF

[5] UniPC: A Unified Predictor-Corrector Framework for Fast Sampling of Diffusion Models PDF

[6] DPM-Solver-v3: Improved Diffusion ODE Solver with Empirical Model Statistics PDF

[7] ERA-Solver: Error-Robust Adams Solver for Fast Sampling of Diffusion Probabilistic Models PDF

[13] DC-Solver: Improving Predictor-Corrector Diffusion Sampler via Dynamic Compensation PDF

[38] Predict, refine, synthesize: Self-guiding diffusion models for probabilistic time series forecasting PDF

[39] Synthesizing PET images from high-field and ultra-high-field MR images using joint diffusion attention model. PDF

[40] Solving inverse problems via diffusion optimal control PDF

Fast and memory-efficient curriculum learning strategy for uniform-state diffusion

[25] The diffusion duality PDF

[20] Learning diffusion models with flexible representation guidance PDF

[21] Curriculum direct preference optimization for diffusion and consistency models PDF

[22] DexVLA: Vision-Language Model with Plug-In Diffusion Expert for General Robot Control PDF

[23] Curriculum conditioned diffusion for multimodal recommendation PDF

[24] Revolutionizing Reinforcement Learning Framework for Diffusion Large Language Models PDF

[26] Progressive compression with universally quantized diffusion models PDF

[27] Score-based generative diffusion models for social recommendations PDF

[28] Towards faster training of diffusion models: An inspiration of a consistency phenomenon PDF

[29] Denoising task difficulty-based curriculum for training diffusion models PDF

Table of Contents

The Diffusion Duality, Chapter II: $\Psi$ -Samplers and Efficient Curriculum