FideDiff: Efficient Diffusion Model for High-Fidelity Image Motion Deblurring

ICLR 2026 Conference SubmissionAnonymous Authors
Image Motion-DeblurringDiffusion Model
Abstract:

Recent advancements in image motion deblurring, driven by CNNs and transformers, have made significant progress. Large-scale pre-trained diffusion models, which are rich in real-world modeling, have shown great promise for high-quality image restoration tasks such as deblurring, demonstrating stronger generative capabilities than CNN and transformer-based methods. However, challenges such as unbearable inference time and compromised fidelity still limit the full potential of the diffusion models. To address this, we introduce FideDiff, a novel single-step diffusion model designed for high-fidelity deblurring. We reformulate motion deblurring as a diffusion-like process where each timestep represents a progressively blurred image, and we train a consistency model that aligns all timesteps to the same clean image. By reconstructing training data with matched blur trajectories, the model learns temporal consistency, enabling accurate one-step deblurring. We further enhance model performance by integrating Kernel ControlNet for blur kernel estimation and introducing adaptive timestep prediction. Our model achieves superior performance on full-reference metrics, surpassing previous diffusion-based methods and matching the performance of other state-of-the-art models. FideDiff offers a new direction for applying pre-trained diffusion models to high-fidelity image restoration tasks, establishing a robust baseline for further advancing diffusion models in real-world industrial applications. Our dataset and code will be publicly available.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces FideDiff, a single-step diffusion model for motion deblurring that reformulates the task as a diffusion-like process with temporal consistency training. It resides in the 'Single-Step and Accelerated Diffusion Inference' leaf, which contains only two papers including this one. This is a relatively sparse research direction within the broader taxonomy of 50 papers across 36 topics, suggesting that single-step diffusion approaches for deblurring remain an emerging area compared to more established branches like iterative refinement or transformer-based architectures.

The taxonomy reveals that FideDiff sits within the 'Diffusion Model Architecture and Training Strategy' branch, which also includes neighboring leaves on iterative multi-step sampling, coarse-to-fine hierarchical methods, transformer-based architectures, and frequency-domain guidance. These adjacent directions represent alternative strategies for balancing quality and efficiency: iterative methods prioritize progressive refinement over speed, while hierarchical and transformer-based approaches focus on architectural expressiveness. FideDiff's single-step approach diverges from these by emphasizing inference speed, positioning it closer to practical deployment scenarios than computationally intensive multi-step alternatives.

Among 14 candidates examined across three contributions, the analysis found limited prior work overlap. The core reformulation of deblurring as a diffusion-like process examined 3 candidates with no clear refutations, suggesting relative novelty in this framing. However, the FideDiff model itself examined 10 candidates with 1 refutable match, and the Kernel ControlNet component examined 1 candidate with 1 refutable match, indicating that specific technical components have more substantial prior work. The modest search scope—14 candidates total—means these findings reflect top semantic matches rather than exhaustive coverage of the field.

Based on the limited literature search, FideDiff appears to occupy a sparsely populated niche within diffusion-based deblurring, particularly in single-step inference. The taxonomy context suggests the work addresses a recognized efficiency challenge, though the contribution-level statistics indicate that individual technical components may have closer precedents than the overall system integration. The analysis covers top-30 semantic matches and does not claim exhaustive field coverage.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
14
Contribution Candidate Papers Compared
2
Refutable Paper

Research Landscape Overview

Core task: image motion deblurring using diffusion models. The field has evolved into a rich taxonomy spanning architectural innovations, blur-specific conditioning strategies, multi-modal integration, video extensions, joint restoration tasks, plug-and-play frameworks, data synthesis techniques, domain-specific applications, and theoretical foundations. The main branches reflect a tension between improving diffusion model efficiency—through accelerated sampling, single-step inference, and lightweight architectures—and enhancing restoration quality via blur priors, auxiliary signals such as events or depth, and multi-task learning. Works like DeblurDiff[1] and Swin-Diff[2] exemplify architectural refinements, while branches on multi-modal integration (e.g., Multimodal Defocus Deblurring[3]) and video deblurring (e.g., Temporal Video Deblurring[5]) address richer input modalities and temporal consistency. Plug-and-play methods (e.g., Plug-and-Play Restoration[14]) and domain-specific applications (e.g., CT Motion Artifact[19]) demonstrate the breadth of diffusion-based deblurring beyond natural images. A particularly active line of work focuses on accelerating diffusion inference to make these models practical for real-time or resource-constrained scenarios. FideDiff[0] sits squarely within this branch, emphasizing single-step and accelerated diffusion inference alongside One-Step Motion Deblurring[18]. Compared to iterative multi-step approaches like DeblurDiff[1] or hierarchical strategies such as Hierarchical Wavelet Diffusion[17], FideDiff[0] prioritizes speed and efficiency, trading off some iterative refinement for faster convergence. This contrasts with works that integrate auxiliary signals (e.g., Event Generation Deblurring[12]) or tackle joint tasks (e.g., Joint Deblurring Super-Resolution[8]), which often accept higher computational costs for richer conditioning. The open question remains how to balance the fidelity gains of multi-step diffusion with the practical demands of single-step inference, a trade-off that FideDiff[0] directly addresses within the broader landscape of diffusion-based deblurring.

Claimed Contributions

Reformulation of motion deblurring as diffusion-like process with time-consistency training

The authors redefine motion deblurring as a diffusion-like process where each timestep represents a progressively blurred image. They train a consistency model that aligns all timesteps to the same clean image, enabling accurate one-step deblurring by learning temporal consistency through reconstructed training data with matched blur trajectories.

3 retrieved papers
FideDiff: single-step high-fidelity foundation model for deblurring

The authors introduce FideDiff, a novel single-step diffusion model designed for high-fidelity image motion deblurring. The model leverages pre-trained diffusion priors and achieves superior performance on full-reference metrics while maintaining fidelity, addressing the trade-off between inference time and restoration quality.

10 retrieved papers
Can Refute
Kernel ControlNet and adaptive timestep prediction module

The authors propose Kernel ControlNet, which integrates blur kernel estimation and control information in the form of filters into the foundation model. They also design a regression module for adaptive timestep prediction, enabling the model to dynamically select appropriate timesteps based on blur severity during inference.

1 retrieved paper
Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Reformulation of motion deblurring as diffusion-like process with time-consistency training

The authors redefine motion deblurring as a diffusion-like process where each timestep represents a progressively blurred image. They train a consistency model that aligns all timesteps to the same clean image, enabling accurate one-step deblurring by learning temporal consistency through reconstructed training data with matched blur trajectories.

Contribution

FideDiff: single-step high-fidelity foundation model for deblurring

The authors introduce FideDiff, a novel single-step diffusion model designed for high-fidelity image motion deblurring. The model leverages pre-trained diffusion priors and achieves superior performance on full-reference metrics while maintaining fidelity, addressing the trade-off between inference time and restoration quality.

Contribution

Kernel ControlNet and adaptive timestep prediction module

The authors propose Kernel ControlNet, which integrates blur kernel estimation and control information in the form of filters into the foundation model. They also design a regression module for adaptive timestep prediction, enabling the model to dynamically select appropriate timesteps based on blur severity during inference.