Score Distillation Beyond Acceleration: Generative Modeling from Corrupted Data

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

Generative modeldiffusion distillation

Learning generative models directly from corrupted observations is a long-standing challenge across natural and scientific domains. We introduce Distillation from Corrupted Data (DCD), a unified framework for learning high-fidelity, one-step generative models using only degraded data of the form $y = \mathcal{A}(x) + \sigma \varepsilon, \ x\sim p_X,\ \varepsilon\sim \mathcal{N}(0,I_m),$ where the mapping $\mathcal{A}$ may be the identity or a non-invertible corruption operator (e.g., blur, masking, subsampling, Fourier acquisition). DCD first pretrains a corruption-aware diffusion teacher on the observed measurements, then distills it into an efficient one-step generator whose samples are statistically closer to the clean distribution $p_X$ . The framework subsumes identity corruption (denoising task) as a special case of our general formulation.

Empirically, DCD consistently reduces Fréchet Inception Distance (FID) relative to corruption-aware diffusion teachers across noisy generation (CIFAR-10, FFHQ, CelebA-HQ, AFHQ-v2), image restoration (Gaussian deblurring, random inpainting, super-resolution, and mixtures with additive noise), and multi-coil MRI—without access to any clean images. The distilled generator inherits one-step sampling efficiency, yielding up to $30\times$ speedups over multi-step diffusion while surpassing the teachers after substantially fewer training iterations. These results establish score distillation as a practical tool for generative modeling from corrupted data, not merely for acceleration. We also provide theoretical support for the use of distillation in enhancing generation quality in the Appendix.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces Distillation from Corrupted Data (DCD), a framework that trains one-step generative models directly from degraded observations by first pretraining a corruption-aware diffusion teacher and then distilling it into an efficient generator. This work resides in the 'Score Distillation and One-Step Generative Models' leaf of the taxonomy, which contains only two papers total. This sparse population suggests the specific combination of distillation techniques with corruption-aware diffusion is an emerging research direction rather than a crowded subfield.

The taxonomy reveals that DCD's parent branch—diffusion-based generative models from corrupted data—encompasses several neighboring approaches including score-based denoising methods, EM-based diffusion training frameworks, and application-specific diffusion models for domains like medical imaging. While these adjacent leaves address corrupted observations through multi-step diffusion or expectation-maximization, they exclude distillation-focused methods by design. The broader taxonomy also shows parallel paradigms using GANs (e.g., AmbientGAN) and VAEs (e.g., MIWAE variants) for corrupted data, indicating that DCD's diffusion-distillation approach represents one architectural choice among multiple viable frameworks.

Among the three analyzed contributions, the literature search examined 24 candidate papers total. The core DCD framework examined 10 candidates with zero refutations, the modular training pipeline examined 4 candidates with zero refutations, and the theoretical analysis examined 10 candidates with zero refutations. This limited search scope—focused on top semantic matches rather than exhaustive coverage—suggests that within the examined subset, no prior work directly anticipates the specific combination of corruption-aware diffusion pretraining followed by distillation into one-step generators. The absence of refutations across all contributions indicates potential novelty, though the small candidate pool limits definitive conclusions.

Based on the 24-paper search scope and the sparse two-paper taxonomy leaf, the work appears to occupy a relatively unexplored intersection of distillation techniques and corruption-aware generative modeling. However, the analysis does not cover the full landscape of diffusion distillation methods or corruption-handling frameworks outside the top semantic matches. The taxonomy structure suggests this direction is nascent rather than saturated, but broader literature beyond the examined candidates may contain relevant precursors or parallel developments.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: generative modeling from corrupted observations. This field addresses the challenge of learning generative models when training data is incomplete, noisy, or otherwise degraded. The taxonomy reveals a rich landscape organized primarily by model architecture: diffusion-based approaches have emerged as a dominant paradigm, alongside GAN-based methods that learn from incomplete or noisy inputs, VAE-based frameworks such as MIWAE[2] and extensions like not-MIWAE[7] that handle missing data through importance weighting, and flow-based invertible models exemplified by AmbientFlow[29]. Additional branches capture generative priors for inverse problems, structured low-rank recovery techniques, domain-specific applications ranging from protein structure modeling to medical imaging, and meta-analyses exploring theoretical foundations and robustness properties. Works like AmbientGAN[13] pioneered GAN training under measurement corruption, while recent efforts such as EM Clean Diffusion[11] and Diffem[14] adapt diffusion models to learn directly from corrupted observations. Within the diffusion branch, a particularly active line of research focuses on score-based methods and distillation techniques that enable efficient one-step or few-step generation even when observations are degraded. Score Distillation Corrupted[0] sits squarely in this cluster, emphasizing distillation strategies that compress multi-step diffusion into faster samplers while handling corruption. This contrasts with neighboring work like Corrupted Data Learning[5], which may prioritize different corruption models or training objectives, and complements foundational score-matching frameworks such as Score Matching[3]. Across these branches, key trade-offs emerge between model expressiveness, computational efficiency, and the types of corruption (missing entries, additive noise, nonlinear measurements) that can be handled. Open questions include how to best incorporate domain knowledge into corruption-aware architectures and whether unified frameworks can bridge the gap between diffusion, GAN, and VAE paradigms for this task.

Claimed Contributions

Distillation from Corrupted Data (DCD) framework

10 retrieved papers

The authors propose DCD, a two-phase framework that first pretrains a corruption-aware diffusion teacher on observed measurements, then distills it into an efficient one-step generator. This unified approach handles diverse corruption operators including identity (denoising), blur, masking, subsampling, and Fourier acquisition without requiring clean images.

10 retrieved papers

Modular two-phase training pipeline

4 retrieved papers

The framework features a modular design where Phase I flexibly incorporates existing corruption-aware diffusion objectives (summarized in Table 1), and Phase II performs distillation while explicitly respecting the measurement operator. This modularity allows straightforward integration of new forward operators or training objectives.

4 retrieved papers

Theoretical analysis of distillation for quality enhancement

10 retrieved papers

The authors provide theoretical analysis establishing conditions under which distillation yields improved sample quality beyond acceleration. They explain that the reversed distillation objective induces mode-seeking behavior, allowing the generator to concentrate probability mass on high-density regions while discarding diffuse areas that the teacher includes due to its mode-covering objective.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[5] Learning generative models from corrupted data PDF

Daras, Giannis (2025)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Distillation from Corrupted Data (DCD) framework

[2] MIWAE: Deep generative modelling and imputation of incomplete data sets PDF

Cannot Refute

[3] Generative Modeling by Estimating Gradients of the Data Distribution PDF

Cannot Refute

[23] Will Large-scale Generative Models Corrupt Future Datasets? PDF

Cannot Refute

[65] Self-Consuming Generative Models Go MAD PDF

Cannot Refute

[66] One-step effective diffusion network for real-world image super-resolution PDF

Cannot Refute

[67] Robust Anomaly Detection of Rotating Machinery with Contaminated Data PDF

Cannot Refute

[68] Artificial Fingerprinting for Generative Models: Rooting Deepfake Attribution in Training Data PDF

Cannot Refute

[69] Speech enhancement and dereverberation with diffusion-based generative models PDF

Cannot Refute

[70] Turning generative models degenerate: The power of data poisoning attacks PDF

Cannot Refute

[71] Neighbor2Neighbor: Self-Supervised Denoising from Single Noisy Images PDF

Cannot Refute

Contribution

Modular two-phase training pipeline

[61] DiffDual-AD: Diffusion-Based Dual-Stage Adversarial Defense Framework in Remote Sensing with Denoiser Constraint PDF

Cannot Refute

[62] Diffusion-driven SpatioTemporal Graph KANsformer for Medical Examination Recommendation PDF

Cannot Refute

[63] CD4LM: Consistency Distillation and aDaptive Decoding for Diffusion Language Models PDF

Cannot Refute

[64] MSDNet: Efficient 4D Radar Super-Resolution via Multi-Stage Distillation PDF

Cannot Refute

Contribution

Theoretical analysis of distillation for quality enhancement

[51] EM Distillation for One-step Diffusion Models PDF

Cannot Refute

[52] MGD3: Mode-Guided Dataset Distillation using Diffusion Models PDF

Cannot Refute

[53] Sparsefusion: Distilling view-conditioned diffusion for 3d reconstruction PDF

Cannot Refute

[54] SDXL-Lightning: Progressive Adversarial Diffusion Distillation PDF

Cannot Refute

[55] Stable score distillation for high-quality 3d generation PDF

Cannot Refute

[56] Ddil: Improved diffusion distillation with imitation learning PDF

Cannot Refute

[57] Flow to the Mode: Mode-Seeking Diffusion Autoencoders for State-of-the-Art Image Tokenization PDF

Cannot Refute

[58] Large Scale Diffusion Distillation via Score-Regularized Continuous-Time Consistency PDF

Cannot Refute

[59] Di o: Distilling masked diffusion models into one-step generator PDF

Cannot Refute

[60] Dive3D: Diverse Distillation-based Text-to-3D Generation via Score Implicit Matching PDF

Cannot Refute

Score Distillation Beyond Acceleration: Generative Modeling from Corrupted Data

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[5] Learning generative models from corrupted data PDF

Contribution Analysis

Distillation from Corrupted Data (DCD) framework

[2] MIWAE: Deep generative modelling and imputation of incomplete data sets PDF

[3] Generative Modeling by Estimating Gradients of the Data Distribution PDF

[23] Will Large-scale Generative Models Corrupt Future Datasets? PDF

[65] Self-Consuming Generative Models Go MAD PDF

[66] One-step effective diffusion network for real-world image super-resolution PDF

[67] Robust Anomaly Detection of Rotating Machinery with Contaminated Data PDF

[68] Artificial Fingerprinting for Generative Models: Rooting Deepfake Attribution in Training Data PDF

[69] Speech enhancement and dereverberation with diffusion-based generative models PDF

[70] Turning generative models degenerate: The power of data poisoning attacks PDF

[71] Neighbor2Neighbor: Self-Supervised Denoising from Single Noisy Images PDF

Modular two-phase training pipeline

[61] DiffDual-AD: Diffusion-Based Dual-Stage Adversarial Defense Framework in Remote Sensing with Denoiser Constraint PDF

[62] Diffusion-driven SpatioTemporal Graph KANsformer for Medical Examination Recommendation PDF

[63] CD4LM: Consistency Distillation and aDaptive Decoding for Diffusion Language Models PDF

[64] MSDNet: Efficient 4D Radar Super-Resolution via Multi-Stage Distillation PDF

Theoretical analysis of distillation for quality enhancement

[51] EM Distillation for One-step Diffusion Models PDF

[52] MGD3: Mode-Guided Dataset Distillation using Diffusion Models PDF

[53] Sparsefusion: Distilling view-conditioned diffusion for 3d reconstruction PDF

[54] SDXL-Lightning: Progressive Adversarial Diffusion Distillation PDF

[55] Stable score distillation for high-quality 3d generation PDF

[56] Ddil: Improved diffusion distillation with imitation learning PDF

[57] Flow to the Mode: Mode-Seeking Diffusion Autoencoders for State-of-the-Art Image Tokenization PDF

[58] Large Scale Diffusion Distillation via Score-Regularized Continuous-Time Consistency PDF

[59] Di o: Distilling masked diffusion models into one-step generator PDF

[60] Dive3D: Diverse Distillation-based Text-to-3D Generation via Score Implicit Matching PDF

Table of Contents