Why Adversarially Train Diffusion Models?

ICLR 2026 Conference SubmissionAnonymous Authors
RobustnessAdversarial training
Abstract:

Adversarial Training (AT) is a known, powerful, well-established technique for improving classifier robustness to input perturbations, yet its applicability beyond discriminative settings remains limited. Motivated by the widespread use of score-based generative models and their need to operate robustly under substantial noisy or corrupted input data, we propose an adaptation of AT for these models, providing a thorough empirical assessment. We introduce a principled formulation of AT for Diffusion Models (DMs) that replaces the conventional invariance objective with an equivariance constraint aligned to the denoising dynamics of score matching. Our method integrates seamlessly into diffusion training by adding either random perturbations--similar to randomized smoothing--or adversarial ones--akin to AT. Our approach offers several advantages: (a) tolerance to heavy noise and corruption, (b) reduced memorization, (c) robustness to outliers and extreme data variability and (d) resilience to iterative adversarial attacks. We validate these claims on proof-of-concept low- and high-dimensional datasets with known ground-truth distributions, enabling precise error analysis. We further evaluate on standard benchmarks (CIFAR-10, CelebA, and LSUN Bedroom), where our approach shows improved robustness and preserved sample fidelity under severe noise, data corruption, and adversarial evaluation. Code available upon acceptance.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes a principled adaptation of adversarial training for diffusion models, replacing the conventional invariance objective with an equivariance constraint aligned to denoising dynamics. It resides in the 'Adversarial Training Formulations for Diffusion Models' leaf, which contains only two papers total (including this one). This is a notably sparse research direction within the broader taxonomy of 50 papers across 36 topics, suggesting the specific formulation of adversarial training for diffusion models remains relatively underexplored compared to adjacent areas like adversarial purification or training from corrupted data.

The taxonomy reveals that this work sits within the 'Adversarial Training and Robustness Enhancement' branch, which contrasts with neighboring branches focused on training from corrupted data (e.g., ambient diffusion, GSURE-based methods) and test-time adaptation. While sibling categories address adversarial purification via diffusion or robustness to common corruptions, this leaf specifically targets training-time formulations that build inherent robustness through adversarial perturbations. The scope note explicitly excludes purification methods and test-time adaptation, positioning this work as concerned with worst-case robustness during model learning rather than post-hoc defense or statistical corruption handling.

Among 18 candidates examined across three contributions, the analysis found 5 refutable pairs. The claim of 'first formal introduction of adversarial training to denoising and diffusion models' examined 10 candidates and identified 3 potential refutations, suggesting prior work exists in this space. The 'principled formulation with equivariance constraint' examined 4 candidates with 1 refutation, while the 'adversarial training algorithm for score-based models' also examined 4 candidates with 1 refutation. These statistics indicate that while the specific equivariance formulation may offer novelty, the broader concept of adversarial training for diffusion models has been explored in the limited literature examined.

Based on the top-18 semantic matches examined, the work appears to contribute a specific technical formulation within an emerging but not entirely new research direction. The limited search scope means the analysis captures nearby prior work but cannot claim exhaustive coverage. The sparse population of the taxonomy leaf (2 papers) suggests either genuine novelty in this precise formulation or that the field is still consolidating around terminology and problem framing.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
18
Contribution Candidate Papers Compared
5
Refutable Paper

Research Landscape Overview

Core task: Adversarial training for diffusion models under data corruption. The field addresses how diffusion models can be trained and deployed when the available data is noisy, incomplete, or otherwise degraded. The taxonomy reveals five main branches: Training Diffusion Models from Corrupted Data focuses on methods that learn generative models directly from imperfect observations, often using techniques like ambient diffusion or expectation-maximization to handle missing or corrupted measurements (e.g., Ambient Diffusion Posterior[2], EM Clean Diffusion[6]). Adversarial Training and Robustness Enhancement explores formulations that explicitly incorporate adversarial perturbations or robust loss functions to improve model resilience (e.g., Adversarial Diffusion Training[0], Adversarial Training Diffusion[20]). Test-Time Adaptation and Domain Shift Handling examines strategies for adjusting models when deployment conditions differ from training, including covariate shift and unknown degradations (e.g., Test-Time Corruption Adaptation[18], Unknown Degradation Adaptation[49]). Domain-Specific Applications of Robust Diffusion applies these ideas to specialized tasks such as medical imaging, speech enhancement, and anomaly detection (e.g., Speech Enhancement Diffusion[13], RDDPM Anomaly[8]). Theoretical Foundations and Generalization investigates the underlying principles, including generalization bounds and certified robustness guarantees (e.g., Certifiably Robust Classifiers[48]). A particularly active line of work contrasts methods that modify the training objective to account for corruption versus those that adapt at test time or purify adversarial inputs. Within the Adversarial Training and Robustness Enhancement branch, Adversarial Diffusion Training[0] sits alongside Adversarial Training Diffusion[20], both emphasizing explicit adversarial formulations during training to build inherent robustness. This contrasts with approaches in the Training from Corrupted Data branch, such as GSURE Diffusion Training[4] or Robust Corrupted Dataset[5], which focus on statistical estimation under noise rather than adversarial perturbations. The original paper's emphasis on adversarial training formulations places it squarely in a cluster concerned with worst-case robustness, distinguishing it from neighboring works that prioritize handling natural corruption or domain shift. Open questions remain about the trade-offs between adversarial robustness and generative quality, and whether unified frameworks can bridge adversarial and statistical corruption models.

Claimed Contributions

Principled formulation of adversarial training for diffusion models with equivariance constraint

The authors propose a novel adversarial training framework specifically designed for diffusion models. Unlike standard adversarial training for classifiers that enforces invariance, their method enforces equivariance to properly align with the denoising process and score-based generative modeling dynamics.

4 retrieved papers
Can Refute
First formal introduction of adversarial training to denoising and diffusion models

The authors claim to be the first to formally introduce adversarial training for diffusion models, establishing connections to denoising and discussing practical implications on the learned denoising process, despite prior work on adversarial aspects in diffusion model training.

10 retrieved papers
Can Refute
Adversarial training algorithm tailored for score-based models enforcing local equivariance and smoothness

The authors develop a specialized adversarial training algorithm for score-based models that enforces equivariance rather than invariance. This approach is designed to promote local smoothness along diffusion trajectories while properly learning the data distribution.

4 retrieved papers
Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Principled formulation of adversarial training for diffusion models with equivariance constraint

The authors propose a novel adversarial training framework specifically designed for diffusion models. Unlike standard adversarial training for classifiers that enforces invariance, their method enforces equivariance to properly align with the denoising process and score-based generative modeling dynamics.

Contribution

First formal introduction of adversarial training to denoising and diffusion models

The authors claim to be the first to formally introduce adversarial training for diffusion models, establishing connections to denoising and discussing practical implications on the learned denoising process, despite prior work on adversarial aspects in diffusion model training.

Contribution

Adversarial training algorithm tailored for score-based models enforcing local equivariance and smoothness

The authors develop a specialized adversarial training algorithm for score-based models that enforces equivariance rather than invariance. This approach is designed to promote local smoothness along diffusion trajectories while properly learning the data distribution.