LiveMoments: Reselected Key Photo Restoration in Live Photos via Reference-guided Diffusion

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

Live PhotoReference-based Image RestorationConditional Image GenerationMotion Alignment

Live Photo captures both a high-quality key photo and a short video clip to preserve the precious dynamics around the captured moment. While users may choose alternative frames as the key photo to capture better expressions or timing, these frames often exhibit noticeable quality degradation, as the photo capture ISP pipeline delivers significantly higher image quality than the video pipeline. This quality gap highlights the need for dedicated restoration techniques to enhance the reselected key photo. To this end, we propose LiveMoments, a reference-guided image restoration framework tailored for the reselected key photo in Live Photos. Our method employs a two-branch neural network: a reference branch that extracts structural and textural information from the original high-quality key photo, and a main branch that restores the reselected frame using the guidance provided by the reference branch. Furthermore, we introduce a unified Motion Alignment module that incorporates motion guidance for spatial alignment at both the latent and image levels. Experiments on real and synthetic Live Photos demonstrate that LiveMoments significantly improves perceptual quality and fidelity over existing solutions, especially in scenes with fast motion or complex structures.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces LiveMoments, a reference-guided restoration framework for reselected key photos in Live Photos, addressing quality degradation when users choose alternative frames from the video clip. According to the taxonomy, this work occupies the 'Multi-Frame Photo Restoration with Motion Alignment' leaf under Reference-Guided Image Restoration, where it appears as the sole paper. This positioning suggests the paper targets a relatively sparse and specialized research direction within the broader image restoration landscape, focusing specifically on the photo-video quality gap in live photo capture systems.

The taxonomy reveals that neighboring research directions include Real-Time Facial Reenactment (focusing on expression transfer) and broader Video Quality Enhancement branches (super-resolution for streaming, archival restoration). LiveMoments diverges from these by exploiting the unique structure of live photos: a high-quality reference frame paired with lower-quality video frames. Unlike general video enhancement methods that lack reference guidance, or facial reenactment techniques targeting expression manipulation, this work specifically addresses the ISP pipeline quality disparity between photo and video capture modes, carving out a distinct problem space at the intersection of multi-frame fusion and reference-based restoration.

Among the 30 candidates examined through semantic search, none clearly refute the three main contributions. The reselected key photo restoration task (10 candidates examined, 0 refutable) appears novel within this limited scope, as does the reference-guided diffusion framework (10 candidates, 0 refutable) and the LiveMoments benchmark dataset (10 candidates, 0 refutable). The absence of refutable prior work across all contributions suggests either genuine novelty or limitations in the search scope. The specialized nature of the live photo restoration problem may explain why existing multi-frame restoration or video enhancement methods do not directly overlap with these specific contributions.

Based on the limited literature search covering 30 semantically similar papers, the work appears to address an underexplored problem space with no direct prior solutions identified. However, the small search scope and the paper's isolation within its taxonomy leaf warrant caution: a broader survey of reference-guided restoration, burst photography, or computational photography venues might reveal closer related work. The analysis captures top semantic matches but may not reflect the full landscape of multi-frame image enhancement research.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Reselected key photo restoration in live photos. This field addresses the challenge of enhancing a user-selected frame from a live photo sequence, leveraging temporal information from neighboring frames to improve quality. The taxonomy organizes work into four main branches. Reference-Guided Image Restoration focuses on methods that exploit multiple frames or reference images to restore a target photo, often requiring careful motion alignment and feature aggregation. Video Quality Enhancement encompasses techniques for improving temporal sequences, including super-resolution and artifact removal across frames. Microscopy Imaging Enhancement targets specialized scientific imaging where temporal data can reveal finer structural details. Real-Time Visual Processing emphasizes low-latency methods suitable for interactive applications. Representative works such as SuperResolution Streaming[1] and Trajectory SuperResolution[4] illustrate how temporal context can be harnessed for quality improvement, while CellINR[5] demonstrates domain-specific enhancements in microscopy. Several active lines of work explore trade-offs between computational efficiency and restoration quality, particularly when aligning frames with complex motion or handling degraded archival content. Within Reference-Guided Image Restoration, a small cluster of methods tackles multi-frame photo restoration with motion alignment, where the central challenge is to register and fuse information from temporally adjacent frames without introducing artifacts. LiveMoments[0] sits squarely in this cluster, emphasizing the restoration of a reselected keyframe by aligning and aggregating features from the live photo burst. Compared to approaches like Archival Enhancement[6], which may prioritize static degradation repair, LiveMoments[0] leverages the temporal redundancy inherent in live photo sequences. This positions it closer to video-inspired techniques such as Trajectory SuperResolution[4], yet with a focus on single-frame output rather than continuous playback, reflecting the unique user interaction model of live photos.

Claimed Contributions

Reselected Key Photo Restoration task for Live Photos

10 retrieved papers

The authors define a new problem of restoring a blurry frame that users select as their preferred key photo in Live Photos by leveraging adjacent sharp frames from the same capture sequence as reference guidance.

10 retrieved papers

Reference-guided diffusion framework for key photo restoration

10 retrieved papers

The authors develop a diffusion-based restoration method that incorporates temporal information from neighboring frames in the Live Photo sequence to enhance the quality of the user-selected blurry key frame.

10 retrieved papers

LiveMoments benchmark dataset

10 retrieved papers

The authors create a dedicated benchmark dataset called LiveMoments to facilitate evaluation and research on the task of restoring reselected key photos in Live Photo sequences.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Reselected Key Photo Restoration task for Live Photos

[7] Deep video deblurring for hand-held cameras PDF

Cannot Refute

[8] Cascaded deep video deblurring using temporal sharpness prior PDF

Cannot Refute

[9] Cascaded deep video deblurring using temporal sharpness prior and non-local spatial-temporal similarity PDF

Cannot Refute

[10] Real-time large-motion Deblurring for Gimbal-based imaging systems PDF

Cannot Refute

[11] Frequency-aware event-based video deblurring for real-world motion blur PDF

Cannot Refute

[12] Video extrapolation using neighboring frames PDF

Cannot Refute

[13] Spatio-temporal filter adaptive network for video deblurring PDF

Cannot Refute

[14] Adversarial spatio-temporal learning for video deblurring PDF

Cannot Refute

[15] Reference-based motion blur removal: Learning to utilize sharpness in the reference image PDF

Cannot Refute

[16] Bringing events into video deblurring with non-consecutively blurry frames PDF

Cannot Refute

Contribution

Reference-guided diffusion framework for key photo restoration

[17] Ditvr: Zero-shot diffusion transformer for video restoration PDF

Cannot Refute

[18] Learning temporally consistent video depth from video diffusion priors PDF

Cannot Refute

[19] One-step diffusion for detail-rich and temporally consistent video super-resolution PDF

Cannot Refute

[20] Dual-Conditioned Temporal Diffusion Modeling for Driving Scene Generation PDF

Cannot Refute

[21] TDM: Temporally-Consistent Diffusion Model for All-in-One Real-World Video Restoration PDF

Cannot Refute

[22] SeedVR: Seeding Infinity in Diffusion Transformer Towards Generic Video Restoration PDF

Cannot Refute

[23] Video diffusion posterior sampling for seeing beyond dynamic scattering layers PDF

Cannot Refute

[24] Long-term talkingface generation via motion-prior conditional diffusion model PDF

Cannot Refute

[25] LViCAR: Diffusion Models for Perceptual Quality Enhancement in Video Compression Artifact Reduction PDF

Cannot Refute

[26] SVFR: A Unified Framework for Generalized Video Face Restoration PDF

Cannot Refute

Contribution

LiveMoments benchmark dataset

The authors create a dedicated benchmark dataset called LiveMoments to facilitate evaluation and research on the task of restoring reselected key photos in Live Photo sequences.

[27] VRT: A Video Restoration Transformer PDF

Cannot Refute

[28] SIDAR: Synthetic Image Dataset for Alignment & Restoration PDF

Cannot Refute

[29] Towards real-world video face restoration: A new benchmark PDF

Cannot Refute

[30] Self-supervised monocular underwater depth recovery, image restoration, and a real-sea video dataset PDF

Cannot Refute

[31] A real-time interactive restoration system for intraoral digital videos using segment anything model PDF

Cannot Refute

[32] Quanta Video Restoration PDF

Cannot Refute

[33] Video restoration based on deep learning: a comprehensive survey PDF

Cannot Refute

[34] : A Large-Scale Multi-Modality Multi-View Synthesized Benchmark Dataset for Image Restoration and Content Creation PDF

Cannot Refute

[35] Introduction to the issue on deep learning for image/video restoration and compression PDF

Cannot Refute

[36] Ppr10k: A large-scale portrait photo retouching dataset with human-region mask and group-level consistency PDF

Cannot Refute

LiveMoments: Reselected Key Photo Restoration in Live Photos via Reference-guided Diffusion

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

Contribution Analysis

Reselected Key Photo Restoration task for Live Photos

[7] Deep video deblurring for hand-held cameras PDF

[8] Cascaded deep video deblurring using temporal sharpness prior PDF

[9] Cascaded deep video deblurring using temporal sharpness prior and non-local spatial-temporal similarity PDF

[10] Real-time large-motion Deblurring for Gimbal-based imaging systems PDF

[11] Frequency-aware event-based video deblurring for real-world motion blur PDF

[12] Video extrapolation using neighboring frames PDF

[13] Spatio-temporal filter adaptive network for video deblurring PDF

[14] Adversarial spatio-temporal learning for video deblurring PDF

[15] Reference-based motion blur removal: Learning to utilize sharpness in the reference image PDF

[16] Bringing events into video deblurring with non-consecutively blurry frames PDF

Reference-guided diffusion framework for key photo restoration

[17] Ditvr: Zero-shot diffusion transformer for video restoration PDF

[18] Learning temporally consistent video depth from video diffusion priors PDF

[19] One-step diffusion for detail-rich and temporally consistent video super-resolution PDF

[20] Dual-Conditioned Temporal Diffusion Modeling for Driving Scene Generation PDF

[21] TDM: Temporally-Consistent Diffusion Model for All-in-One Real-World Video Restoration PDF

[22] SeedVR: Seeding Infinity in Diffusion Transformer Towards Generic Video Restoration PDF

[23] Video diffusion posterior sampling for seeing beyond dynamic scattering layers PDF

[24] Long-term talkingface generation via motion-prior conditional diffusion model PDF

[25] LViCAR: Diffusion Models for Perceptual Quality Enhancement in Video Compression Artifact Reduction PDF

[26] SVFR: A Unified Framework for Generalized Video Face Restoration PDF

LiveMoments benchmark dataset

[27] VRT: A Video Restoration Transformer PDF

[28] SIDAR: Synthetic Image Dataset for Alignment & Restoration PDF

[29] Towards real-world video face restoration: A new benchmark PDF

[30] Self-supervised monocular underwater depth recovery, image restoration, and a real-sea video dataset PDF

[31] A real-time interactive restoration system for intraoral digital videos using segment anything model PDF

[32] Quanta Video Restoration PDF

[33] Video restoration based on deep learning: a comprehensive survey PDF

[34] : A Large-Scale Multi-Modality Multi-View Synthesized Benchmark Dataset for Image Restoration and Content Creation PDF

[35] Introduction to the issue on deep learning for image/video restoration and compression PDF

[36] Ppr10k: A large-scale portrait photo retouching dataset with human-region mask and group-level consistency PDF

Table of Contents