Continuous Space-Time Video Super-Resolution with 3D Fourier Fields

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

continuous space-time video super-resolutionarbitrary-scale super-resolutionlow-level vision

We introduce a novel formulation for continuous space-time video super-resolution. Instead of decoupling the representation of a video sequence into separate spatial and temporal components and relying on brittle, explicit frame warping for motion compensation, we encode video as a continuous, spatio-temporally coherent 3D Video Fourier Field (VFF). That representation offers three key advantages: (1) it enables cheap, flexible sampling at arbitrary locations in space and time; (2) it is able to simultaneously capture fine spatial detail and smooth temporal dynamics; and (3) it offers the possibility to include an analytical, Gaussian point spread function in the sampling to ensure aliasing-free reconstruction at arbitrary scale. The coefficients of the proposed, Fourier-like sinusoidal basis are predicted with a neural encoder with a large spatio-temporal receptive field, conditioned on the low-resolution input video. Through extensive experiments, we show that our joint modeling substantially improves both spatial and temporal super-resolution and sets a new state of the art for multiple benchmarks: across a wide range of upscaling factors, it delivers sharper and temporally more consistent reconstructions than existing baselines, while being computationally more efficient. Code will be published upon acceptance.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes a Video Fourier Field (VFF) representation that encodes video as a continuous 3D spatio-temporal function in the frequency domain, enabling arbitrary-scale sampling in both space and time. According to the taxonomy, this work resides in the 'Fourier and Frequency Domain INR' leaf under 'Implicit Neural Representation Approaches', which contains only two papers total. This indicates a relatively sparse research direction within the broader field of continuous space-time video super-resolution, suggesting the frequency-domain formulation for implicit neural video representations remains underexplored compared to motion-based or transformer-based approaches.

The taxonomy reveals that neighboring leaves include 'Local Implicit Neural Functions' (focusing on motion trajectory modeling) and 'Arbitrary-Scale Alignment Networks' (emphasizing neural alignment modules). The broader 'Implicit Neural Representation Approaches' branch contrasts sharply with the heavily populated 'Motion Estimation and Compensation Frameworks' branch, which contains multiple subcategories addressing optical flow, deformable convolutions, and bidirectional propagation. The paper's frequency-domain approach diverges from these motion-centric methods by avoiding explicit frame warping, instead relying on learned Fourier coefficients to capture spatio-temporal coherence—a fundamentally different modeling philosophy that positions it at the intersection of signal processing and neural representation learning.

Among the three contributions analyzed, the V3 end-to-end framework shows one refutable candidate among ten examined, suggesting some overlap with existing architectures in the limited search scope. The Video Fourier Field representation itself examined four candidates with zero refutations, indicating potential novelty within the analyzed subset. The analytical Gaussian point spread function for anti-aliasing examined ten candidates without clear refutation. Given that only twenty-four total candidates were examined across all contributions, these statistics reflect a focused but not exhaustive literature comparison, primarily capturing semantically similar works rather than the entire field of implicit neural video representations.

Based on the limited search scope of twenty-four candidates, the work appears to occupy a distinctive position within the sparse Fourier-domain INR cluster. The taxonomy structure suggests this frequency-based formulation represents a less-traveled path compared to motion-compensation or transformer-based alternatives. However, the analysis does not cover the full landscape of implicit neural representations or signal processing techniques for video, leaving open questions about connections to broader frequency-domain methods outside the examined candidate set.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: continuous space-time video super-resolution aims to enhance video quality by jointly upsampling both spatial resolution and temporal frame rate to arbitrary scales. The field has evolved into several major branches that reflect different modeling philosophies and technical emphases. Implicit Neural Representation Approaches leverage coordinate-based networks to represent video content continuously, enabling flexible querying at any spatial or temporal location; within this branch, some works explore Fourier and frequency domain formulations while others focus on hierarchical or alignment-driven strategies. Motion Estimation and Compensation Frameworks build on classical optical flow techniques, refining alignment and warping mechanisms to propagate information across frames, as seen in methods like BasicVSR++[11] and Multi-Stage Motion[6]. Transformer-Based Architectures exploit self-attention to capture long-range dependencies, with works such as RSTT[13] and Trajectory-Aware Transformer[15] demonstrating the power of global context. Event Camera Enhanced Methods incorporate asynchronous event data to improve temporal fidelity, exemplified by EvSTVSR[1] and EvEnhancer[16]. Additional branches address Specialized Applications (e.g., omnidirectional or satellite video), Auxiliary Task and Multi-Task Learning, and Multimodal Understanding, reflecting the diversity of problem settings and data modalities. Recent research reveals contrasting trade-offs between representation flexibility and computational efficiency. Implicit neural methods offer continuous querying and compact parameterizations, yet often require careful design to handle high-frequency details and temporal consistency. Motion-based frameworks excel at leveraging inter-frame correlations but can struggle with occlusions and complex motion patterns, prompting hybrid designs that combine flow estimation with learned refinement modules. Fourier Fields[0] sits within the Fourier and Frequency Domain INR cluster, emphasizing spectral representations to capture fine-grained spatiotemporal patterns more efficiently than purely spatial coordinate mappings. This approach contrasts with neighboring works like MeshfreeFlowNet[8], which adopts a meshfree interpolation perspective, and HR-INR[2], which focuses on hierarchical implicit structures. By operating in the frequency domain, Fourier Fields[0] aims to balance expressive power with computational tractability, addressing a key challenge in continuous space-time super-resolution where both spatial sharpness and temporal smoothness must be preserved across arbitrary scales.

Claimed Contributions

Video Fourier Field (VFF) representation

4 retrieved papers

A unified continuous video representation based on a 3D trigonometric expansion over joint (x, y, t) space. Unlike prior methods that decouple spatial and temporal components, VFF jointly models space and time using sinusoidal basis functions, enabling flexible sampling at arbitrary spatio-temporal resolutions without explicit warping.

4 retrieved papers

V3 end-to-end framework

Can Refute

10 retrieved papers

An end-to-end trainable system that uses a neural encoder with large spatio-temporal receptive field to predict VFF coefficients from low-resolution input videos. The framework enables continuous space-time video super-resolution by sampling the learned representation at arbitrary scales.

10 retrieved papers

Can Refute

Analytical Gaussian PSF for anti-aliasing

10 retrieved papers

A closed-form mechanism for anti-aliasing that integrates a Gaussian point spread function directly into the VFF sampling process. This enables theoretically correct frequency suppression when super-resolving at different scales, without requiring learned adaptive filtering.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[8] Meshfreeflownet: A physics-constrained deep continuous space-time super-resolution framework PDF

Chiyu Jiang, Soheil Esmaeilzadeh, C. Jiang, Kamyar Azizzadenesheli, S. Esmaeilzadeh, Karthik Kashinath, K. Azizzadenesheli, Mustafa Mustafa, K. Kashinath, Hamdi A. Tchelepi, Mustafa A. Mustafa, Philip Marcus, H. Tchelepi, Mr Prabhat, P. Marcus, Anima Anandkumar, Prabhat (2020)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Video Fourier Field (VFF) representation

[51] AsyReC: A Multimodal Graph-based Framework for Spatio-Temporal Asymmetric Dyadic Relationship Classification PDF

Cannot Refute

[52] Fourierhandflow: Neural 4d hand representation using fourier query flow PDF

Cannot Refute

[53] CO2-Net: A Physics-Informed Spatio-Temporal Model for Global Surface CO2 Reconstruction PDF

Cannot Refute

[54] Solving trigonometric moment problems for fast transient imaging PDF

Cannot Refute

Contribution

V3 end-to-end framework

[19] VideoINR: Learning Video Implicit Neural Representation for Continuous Space-Time Super-Resolution PDF

Can Refute

[6] Continuous Space-Time Video Super-Resolution with Multi-Stage Motion Information Reorganization PDF

Cannot Refute

[16] EvEnhancer: Empowering Effectiveness, Efficiency and Generalizability for Continuous Space-Time Video Super-Resolution with Events PDF

Cannot Refute

[33] Deformable Convolution Alignment and Dynamic Scale-Aware Network for Continuous-Scale Satellite Video Super-Resolution PDF

Cannot Refute

[55] Implicit Diffusion Models for Continuous Super-Resolution PDF

Cannot Refute

[56] Learning Spatial-Temporal Implicit Neural Representations for Event-Guided Video Super-Resolution PDF

Cannot Refute

[57] A Lightweight Network With Latent Representations for UAV Thermal Image Super-Resolution PDF

Cannot Refute

[58] Gaussiansr: High fidelity 2d gaussian splatting for arbitrary-scale image super-resolution PDF

Cannot Refute

[59] Generalized and Efficient 2D Gaussian Splatting for Arbitrary-scale Super-Resolution PDF

Cannot Refute

[60] Video Multi-Scale-Based End-to-End Rate Control in Deep Contextual Video Compression PDF

Cannot Refute

Contribution

Analytical Gaussian PSF for anti-aliasing

[61] Multi-Scale 3D Gaussian Splatting for Anti-Aliased Rendering PDF

Cannot Refute

[62] Analytic-Splatting: Anti-Aliased 3D Gaussian Splatting via Analytic Integration PDF

Cannot Refute

[63] HighâResolution Integral Imaging Display Using Targeted Optimized Compound Lens Array for Voxel Aliasing Elimination PDF

Cannot Refute

[64] Scale factor point spread function matching: Beyond aliasing in image resampling PDF

Cannot Refute

[65] Image and video restoration PDF

Cannot Refute

[66] Super-resolution mosaicing from MPEG compressed video PDF

Cannot Refute

[67] Local object-based super-resolution mosaicing from low-resolution video PDF

Cannot Refute

[68] Image-based refocusing by 3D filtering PDF

Cannot Refute

[69] Information Loss and Anti-Aliasing Filters in Multirate Systems PDF

Cannot Refute

[70] Blind deconvolution and super-resolution of low-resolution images and videos PDF

Cannot Refute

Continuous Space-Time Video Super-Resolution with 3D Fourier Fields

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[8] Meshfreeflownet: A physics-constrained deep continuous space-time super-resolution framework PDF

Contribution Analysis

Video Fourier Field (VFF) representation

[51] AsyReC: A Multimodal Graph-based Framework for Spatio-Temporal Asymmetric Dyadic Relationship Classification PDF

[52] Fourierhandflow: Neural 4d hand representation using fourier query flow PDF

[53] CO2-Net: A Physics-Informed Spatio-Temporal Model for Global Surface CO2 Reconstruction PDF

[54] Solving trigonometric moment problems for fast transient imaging PDF

V3 end-to-end framework

[19] VideoINR: Learning Video Implicit Neural Representation for Continuous Space-Time Super-Resolution PDF

[6] Continuous Space-Time Video Super-Resolution with Multi-Stage Motion Information Reorganization PDF

[16] EvEnhancer: Empowering Effectiveness, Efficiency and Generalizability for Continuous Space-Time Video Super-Resolution with Events PDF

[33] Deformable Convolution Alignment and Dynamic Scale-Aware Network for Continuous-Scale Satellite Video Super-Resolution PDF

[55] Implicit Diffusion Models for Continuous Super-Resolution PDF

[56] Learning Spatial-Temporal Implicit Neural Representations for Event-Guided Video Super-Resolution PDF

[57] A Lightweight Network With Latent Representations for UAV Thermal Image Super-Resolution PDF

[58] Gaussiansr: High fidelity 2d gaussian splatting for arbitrary-scale image super-resolution PDF

[59] Generalized and Efficient 2D Gaussian Splatting for Arbitrary-scale Super-Resolution PDF

[60] Video Multi-Scale-Based End-to-End Rate Control in Deep Contextual Video Compression PDF

Analytical Gaussian PSF for anti-aliasing

[61] Multi-Scale 3D Gaussian Splatting for Anti-Aliased Rendering PDF

[62] Analytic-Splatting: Anti-Aliased 3D Gaussian Splatting via Analytic Integration PDF

[63] HighâResolution Integral Imaging Display Using Targeted Optimized Compound Lens Array for Voxel Aliasing Elimination PDF

[64] Scale factor point spread function matching: Beyond aliasing in image resampling PDF

[65] Image and video restoration PDF

[66] Super-resolution mosaicing from MPEG compressed video PDF

[67] Local object-based super-resolution mosaicing from low-resolution video PDF

[68] Image-based refocusing by 3D filtering PDF

[69] Information Loss and Anti-Aliasing Filters in Multirate Systems PDF

[70] Blind deconvolution and super-resolution of low-resolution images and videos PDF

Table of Contents

[63] HighâResolution Integral Imaging Display Using Targeted Optimized Compound Lens Array for Voxel Aliasing Elimination PDF