Score Distillation Beyond Acceleration: Generative Modeling from Corrupted Data

ICLR 2026 Conference SubmissionAnonymous Authors
Generative modeldiffusion distillation
Abstract:

Learning generative models directly from corrupted observations is a long-standing challenge across natural and scientific domains. We introduce Distillation from Corrupted Data (DCD), a unified framework for learning high-fidelity, one-step generative models using only degraded data of the form y=A(x)+σε, xpX, εN(0,Im),y = \mathcal{A}(x) + \sigma \varepsilon, \ x\sim p_X,\ \varepsilon\sim \mathcal{N}(0,I_m), where the mapping A\mathcal{A} may be the identity or a non-invertible corruption operator (e.g., blur, masking, subsampling, Fourier acquisition). DCD first pretrains a corruption-aware diffusion teacher on the observed measurements, then distills it into an efficient one-step generator whose samples are statistically closer to the clean distribution pXp_X. The framework subsumes identity corruption (denoising task) as a special case of our general formulation.

Empirically, DCD consistently reduces Fréchet Inception Distance (FID) relative to corruption-aware diffusion teachers across noisy generation (CIFAR-10, FFHQ, CelebA-HQ, AFHQ-v2), image restoration (Gaussian deblurring, random inpainting, super-resolution, and mixtures with additive noise), and multi-coil MRI—without access to any clean images. The distilled generator inherits one-step sampling efficiency, yielding up to 30×30\times speedups over multi-step diffusion while surpassing the teachers after substantially fewer training iterations. These results establish score distillation as a practical tool for generative modeling from corrupted data, not merely for acceleration. We also provide theoretical support for the use of distillation in enhancing generation quality in the Appendix.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces Distillation from Corrupted Data (DCD), a framework that trains one-step generative models directly from degraded observations by first pretraining a corruption-aware diffusion teacher and then distilling it into an efficient generator. This work resides in the 'Score Distillation and One-Step Generative Models' leaf of the taxonomy, which contains only two papers total. This sparse population suggests the specific combination of distillation techniques with corruption-aware diffusion is an emerging research direction rather than a crowded subfield.

The taxonomy reveals that DCD's parent branch—diffusion-based generative models from corrupted data—encompasses several neighboring approaches including score-based denoising methods, EM-based diffusion training frameworks, and application-specific diffusion models for domains like medical imaging. While these adjacent leaves address corrupted observations through multi-step diffusion or expectation-maximization, they exclude distillation-focused methods by design. The broader taxonomy also shows parallel paradigms using GANs (e.g., AmbientGAN) and VAEs (e.g., MIWAE variants) for corrupted data, indicating that DCD's diffusion-distillation approach represents one architectural choice among multiple viable frameworks.

Among the three analyzed contributions, the literature search examined 24 candidate papers total. The core DCD framework examined 10 candidates with zero refutations, the modular training pipeline examined 4 candidates with zero refutations, and the theoretical analysis examined 10 candidates with zero refutations. This limited search scope—focused on top semantic matches rather than exhaustive coverage—suggests that within the examined subset, no prior work directly anticipates the specific combination of corruption-aware diffusion pretraining followed by distillation into one-step generators. The absence of refutations across all contributions indicates potential novelty, though the small candidate pool limits definitive conclusions.

Based on the 24-paper search scope and the sparse two-paper taxonomy leaf, the work appears to occupy a relatively unexplored intersection of distillation techniques and corruption-aware generative modeling. However, the analysis does not cover the full landscape of diffusion distillation methods or corruption-handling frameworks outside the top semantic matches. The taxonomy structure suggests this direction is nascent rather than saturated, but broader literature beyond the examined candidates may contain relevant precursors or parallel developments.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
24
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: generative modeling from corrupted observations. This field addresses the challenge of learning generative models when training data is incomplete, noisy, or otherwise degraded. The taxonomy reveals a rich landscape organized primarily by model architecture: diffusion-based approaches have emerged as a dominant paradigm, alongside GAN-based methods that learn from incomplete or noisy inputs, VAE-based frameworks such as MIWAE[2] and extensions like not-MIWAE[7] that handle missing data through importance weighting, and flow-based invertible models exemplified by AmbientFlow[29]. Additional branches capture generative priors for inverse problems, structured low-rank recovery techniques, domain-specific applications ranging from protein structure modeling to medical imaging, and meta-analyses exploring theoretical foundations and robustness properties. Works like AmbientGAN[13] pioneered GAN training under measurement corruption, while recent efforts such as EM Clean Diffusion[11] and Diffem[14] adapt diffusion models to learn directly from corrupted observations. Within the diffusion branch, a particularly active line of research focuses on score-based methods and distillation techniques that enable efficient one-step or few-step generation even when observations are degraded. Score Distillation Corrupted[0] sits squarely in this cluster, emphasizing distillation strategies that compress multi-step diffusion into faster samplers while handling corruption. This contrasts with neighboring work like Corrupted Data Learning[5], which may prioritize different corruption models or training objectives, and complements foundational score-matching frameworks such as Score Matching[3]. Across these branches, key trade-offs emerge between model expressiveness, computational efficiency, and the types of corruption (missing entries, additive noise, nonlinear measurements) that can be handled. Open questions include how to best incorporate domain knowledge into corruption-aware architectures and whether unified frameworks can bridge the gap between diffusion, GAN, and VAE paradigms for this task.

Claimed Contributions

Distillation from Corrupted Data (DCD) framework

The authors propose DCD, a two-phase framework that first pretrains a corruption-aware diffusion teacher on observed measurements, then distills it into an efficient one-step generator. This unified approach handles diverse corruption operators including identity (denoising), blur, masking, subsampling, and Fourier acquisition without requiring clean images.

10 retrieved papers
Modular two-phase training pipeline

The framework features a modular design where Phase I flexibly incorporates existing corruption-aware diffusion objectives (summarized in Table 1), and Phase II performs distillation while explicitly respecting the measurement operator. This modularity allows straightforward integration of new forward operators or training objectives.

4 retrieved papers
Theoretical analysis of distillation for quality enhancement

The authors provide theoretical analysis establishing conditions under which distillation yields improved sample quality beyond acceleration. They explain that the reversed distillation objective induces mode-seeking behavior, allowing the generator to concentrate probability mass on high-density regions while discarding diffuse areas that the teacher includes due to its mode-covering objective.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Distillation from Corrupted Data (DCD) framework

The authors propose DCD, a two-phase framework that first pretrains a corruption-aware diffusion teacher on observed measurements, then distills it into an efficient one-step generator. This unified approach handles diverse corruption operators including identity (denoising), blur, masking, subsampling, and Fourier acquisition without requiring clean images.

Contribution

Modular two-phase training pipeline

The framework features a modular design where Phase I flexibly incorporates existing corruption-aware diffusion objectives (summarized in Table 1), and Phase II performs distillation while explicitly respecting the measurement operator. This modularity allows straightforward integration of new forward operators or training objectives.

Contribution

Theoretical analysis of distillation for quality enhancement

The authors provide theoretical analysis establishing conditions under which distillation yields improved sample quality beyond acceleration. They explain that the reversed distillation objective induces mode-seeking behavior, allowing the generator to concentrate probability mass on high-density regions while discarding diffuse areas that the teacher includes due to its mode-covering objective.

Score Distillation Beyond Acceleration: Generative Modeling from Corrupted Data | Novelty Validation