DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

text diffusion model; diffusion large language model; code generation

Diffusion large language models (dLLMs) are compelling alternatives to autoregressive (AR) models because their denoising models operate over the entire sequence. The global planning and iterative refinement features of dLLMs are particularly useful for code generation. However, current training and inference mechanisms for dLLMs in coding are still under-explored. To demystify the decoding behavior of dLLMs and unlock their potential for coding, we systematically investigate their denoising processes and reinforcement learning (RL) methods. We train a 7B dLLM, DiffuCoder, on 130B tokens of code. Using this model as a testbed, we analyze its decoding behavior, revealing how it differs from that of AR models: (1) dLLMs can decide how causal their generation should be without relying on semi-AR decoding, and (2) increasing the sampling temperature diversifies not only token choices but also their generation order. This diversity creates a rich search space for RL rollouts. For RL training, to reduce the variance of token log-likelihood estimates and maintain training efficiency, we propose coupled-GRPO, a novel sampling scheme that constructs complementary mask noise for completions used in training. In our experiments, coupled-GRPO significantly improves DiffuCoder's performance on code generation benchmarks (+4.4% on EvalPlus) and reduces reliance on AR bias during decoding. Our work provides deeper insight into the machinery of dLLM generation and offers an effective, diffusion-native RL training framework.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces DiffuCoder, a 7B diffusion language model trained on 130B code tokens, and proposes coupled-GRPO, a reinforcement learning algorithm tailored for diffusion-based code generation. According to the taxonomy, this work resides in the 'Trajectory-Level Reinforcement Learning' leaf under the broader 'Reinforcement Learning and Optimization for Diffusion Models' branch. This leaf contains only two papers total, including the original work, indicating a relatively sparse research direction within the masked diffusion for code generation landscape.

The taxonomy reveals that neighboring leaves explore alternative optimization strategies: 'Latent Policy Adaptation and Reward-Guided Decoding' focuses on external reward models guiding decoding, while 'Distillation and Acceleration via Reinforcement Learning' emphasizes efficiency through distillation. The sibling paper in the same leaf, Lateral Thought Diffusion, shares the trajectory-level optimization theme but may target broader sequential reasoning contexts. Meanwhile, the 'Core Diffusion Architectures' and 'Inference and Sampling Strategies' branches address orthogonal concerns—foundational training mechanisms and decoding algorithms—suggesting DiffuCoder's RL contributions occupy a distinct methodological niche.

Among 30 candidates examined, the DiffuCoder model contribution shows one refutable candidate out of ten examined, suggesting some prior work on large-scale diffusion models for code exists. The local/global AR-ness metrics contribution found no refutable candidates among ten examined, indicating potential novelty in analyzing diffusion decoding behavior. The coupled-GRPO algorithm shows two refutable candidates out of ten, implying moderate overlap with existing RL methods for diffusion models. These statistics reflect a limited semantic search scope, not exhaustive coverage of all relevant literature.

Based on the top-30 semantic matches examined, the work appears to occupy a moderately explored intersection of diffusion models and reinforcement learning for code generation. The trajectory-level RL focus sits in a sparse taxonomy leaf, though the broader RL-for-diffusion branch contains related efforts. The analysis does not cover potential work outside the semantic search radius or recent preprints, leaving open questions about comprehensiveness in rapidly evolving diffusion model research.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: masked diffusion models for code generation. The field organizes around four main branches that reflect different aspects of applying diffusion techniques to discrete code synthesis. Core Diffusion Architectures and Training Frameworks establish foundational masking and denoising mechanisms, often exploring how to adapt continuous diffusion principles to token-level generation (e.g., CodeFusion[6], Soft-Masked Diffusion[8]). Inference and Sampling Strategies address how to efficiently decode from learned diffusion models, including scheduling variants like Dilated Scheduling[11] and lookahead techniques such as Lookahead Unmasking[3]. Reinforcement Learning and Optimization for Diffusion Models investigates trajectory-level or policy-based refinements to improve sample quality and task-specific performance. Finally, Theoretical Analysis and Comparative Studies examine trade-offs between diffusion and autoregressive paradigms, as seen in works like Diffusion vs Autoregression[15], providing empirical and conceptual grounding for design choices. Within the reinforcement learning branch, a small cluster of works explores trajectory-level optimization to guide diffusion sampling toward higher-quality outputs. DiffuCoder[0] sits squarely in this area, emphasizing RL-driven refinement of masked diffusion trajectories for code generation tasks. It shares thematic overlap with Lateral Thought Diffusion[9], which similarly leverages trajectory-level reasoning, though the latter may focus on broader sequential decision-making contexts. Meanwhile, neighboring efforts like Latent Adaptation Masked Policy[5] investigate policy adaptation in latent spaces, highlighting an ongoing tension between end-to-end RL tuning and modular latent interventions. These contrasting approaches reflect open questions about where and how to inject optimization signals—whether at the token unmasking level, across entire generation rollouts, or within learned latent representations—underscoring the evolving interplay between diffusion mechanics and reinforcement learning in discrete generation domains.

Claimed Contributions

DiffuCoder: 7B diffusion model for code generation

Can Refute

10 retrieved papers

The authors train DiffuCoder, a 7-billion parameter masked diffusion language model specialized for code generation, trained on 130B tokens. This model serves as a testbed for analyzing diffusion model behavior and developing new training methods.

10 retrieved papers

Can Refute

Local and global AR-ness metrics for analyzing diffusion decoding

10 retrieved papers

The authors propose two metrics to quantify how closely diffusion models follow autoregressive (left-to-right) generation patterns. These metrics reveal that diffusion models can adaptively decide their generation order and that higher sampling temperatures increase non-autoregressive behavior.

10 retrieved papers

Coupled-GRPO: diffusion-native reinforcement learning algorithm

Can Refute

10 retrieved papers

The authors develop coupled-GRPO, a reinforcement learning method tailored for diffusion models that uses complementary mask noise pairs to reduce variance in token likelihood estimation while maintaining training efficiency. This method respects the non-autoregressive nature of diffusion models and significantly improves performance.

10 retrieved papers

Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[9] Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models PDF

Huang Ze-min, Chen Zhi-yang, Zemin Huang, Wang, Zijun, Zhiyang Chen, Li TianCheng, Zijun Wang, QI Guo-jun, Tiancheng Li, Guo-Jun Qi (2025)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

DiffuCoder: 7B diffusion model for code generation

[6] CodeFusion: A Pre-trained Diffusion Model for Code Generation PDF

Can Refute

[27] Beyond autoregression: An empirical study of diffusion large language models for code generation PDF

Cannot Refute

[28] Dream-coder 7b: An open diffusion language model for code PDF

Cannot Refute

[29] Dream 7b: Diffusion large language models PDF

Cannot Refute

[30] Seed diffusion: A large-scale diffusion language model with high-speed inference PDF

Cannot Refute

[31] Mercury: Ultra-fast language models based on diffusion PDF

Cannot Refute

[32] dKV-Cache: The Cache for Diffusion Language Models PDF

Cannot Refute

[33] DDPT: Diffusion-Driven Prompt Tuning for Large Language Model Code Generation PDF

Cannot Refute

[34] Diffusion-based Large Language Models Survey PDF

Cannot Refute

[35] Directional Diffusion-Style Code Editing Pre-training PDF

Cannot Refute

Contribution

Local and global AR-ness metrics for analyzing diffusion decoding

[36] Progressive Autoregressive Video Diffusion Models PDF

Cannot Refute

[37] From Slow Bidirectional to Fast Autoregressive Video Diffusion Models PDF

Cannot Refute

[38] ViD-GPT: Introducing GPT-style Autoregressive Generation in Video Diffusion Models PDF

Cannot Refute

[39] Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models PDF

Cannot Refute

[40] Ar-diffusion: Asynchronous video generation with auto-regressive diffusion PDF

Cannot Refute

[41] Ditar: Diffusion transformer autoregressive modeling for speech generation PDF

Cannot Refute

[42] Amd: Autoregressive motion diffusion PDF

Cannot Refute

[43] Ar-diffusion: Auto-regressive diffusion model for text generation PDF

Cannot Refute

[44] An efficient diffusion-based non-autoregressive solver for traveling salesman problem PDF

Cannot Refute

[45] Thermalizer: Stable autoregressive neural emulation of spatiotemporal chaos PDF

Cannot Refute

Contribution

Coupled-GRPO: diffusion-native reinforcement learning algorithm

[18] Principled and Tractable RL for Reasoning with Diffusion Language Models PDF

Can Refute

[25] Improving Reasoning for Diffusion Language Models via Group Diffusion Policy Optimization PDF

Can Refute

[17] Consolidating Reinforcement Learning for Multimodal Discrete Diffusion Models PDF

Cannot Refute

[19] Step-Aware Policy Optimization for Reasoning in Diffusion Large Language Models PDF

Cannot Refute

[20] Reward-weighted sampling: Enhancing non-autoregressive characteristics in masked diffusion llms PDF

Cannot Refute

[21] Ctrldiff: Boosting large diffusion language models with dynamic block prediction and controllable generation PDF

Cannot Refute

[22] Taming Masked Diffusion Language Models via Consistency Trajectory Reinforcement Learning with Fewer Decoding Step PDF

Cannot Refute

[23] Text diffusion with reinforced conditioning PDF

Cannot Refute

[24] MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation PDF

Cannot Refute

[26] Sequence-augmented Conversational Recommendation System Based on Diffusion Models for Personalized Cultural Exploration PDF

Cannot Refute

DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[9] Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models PDF

Contribution Analysis

DiffuCoder: 7B diffusion model for code generation

[6] CodeFusion: A Pre-trained Diffusion Model for Code Generation PDF

[27] Beyond autoregression: An empirical study of diffusion large language models for code generation PDF

[28] Dream-coder 7b: An open diffusion language model for code PDF

[29] Dream 7b: Diffusion large language models PDF

[30] Seed diffusion: A large-scale diffusion language model with high-speed inference PDF

[31] Mercury: Ultra-fast language models based on diffusion PDF

[32] dKV-Cache: The Cache for Diffusion Language Models PDF

[33] DDPT: Diffusion-Driven Prompt Tuning for Large Language Model Code Generation PDF

[34] Diffusion-based Large Language Models Survey PDF

[35] Directional Diffusion-Style Code Editing Pre-training PDF

Local and global AR-ness metrics for analyzing diffusion decoding

[36] Progressive Autoregressive Video Diffusion Models PDF

[37] From Slow Bidirectional to Fast Autoregressive Video Diffusion Models PDF

[38] ViD-GPT: Introducing GPT-style Autoregressive Generation in Video Diffusion Models PDF

[39] Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models PDF

[40] Ar-diffusion: Asynchronous video generation with auto-regressive diffusion PDF

[41] Ditar: Diffusion transformer autoregressive modeling for speech generation PDF

[42] Amd: Autoregressive motion diffusion PDF

[43] Ar-diffusion: Auto-regressive diffusion model for text generation PDF

[44] An efficient diffusion-based non-autoregressive solver for traveling salesman problem PDF

[45] Thermalizer: Stable autoregressive neural emulation of spatiotemporal chaos PDF

Coupled-GRPO: diffusion-native reinforcement learning algorithm

[18] Principled and Tractable RL for Reasoning with Diffusion Language Models PDF

[25] Improving Reasoning for Diffusion Language Models via Group Diffusion Policy Optimization PDF

[17] Consolidating Reinforcement Learning for Multimodal Discrete Diffusion Models PDF

[19] Step-Aware Policy Optimization for Reasoning in Diffusion Large Language Models PDF

[20] Reward-weighted sampling: Enhancing non-autoregressive characteristics in masked diffusion llms PDF

[21] Ctrldiff: Boosting large diffusion language models with dynamic block prediction and controllable generation PDF

[22] Taming Masked Diffusion Language Models via Consistency Trajectory Reinforcement Learning with Fewer Decoding Step PDF

[23] Text diffusion with reinforced conditioning PDF

[24] MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation PDF

[26] Sequence-augmented Conversational Recommendation System Based on Diffusion Models for Personalized Cultural Exploration PDF

Table of Contents