AnyUp: Universal Feature Upsampling

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.5 Download Report PDF

feature upsamplingrepresentation learning

We introduce AnyUp, a method for feature upsampling that can be applied to any vision feature at any resolution, without encoder-specific training. Existing learning-based upsamplers for features like DINO or CLIP need to be re-trained for every feature extractor and thus do not generalize to different feature types at inference time. In this work, we propose an inference-time feature-agnostic upsampling architecture to alleviate this limitation and improve upsampling quality. In our experiments, AnyUp sets a new state of the art for upsampled features, generalizes to different feature types, and preserves feature semantics while being efficient and easy to apply to a wide range of downstream tasks.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes AnyUp, an inference-time feature upsampling method designed to work with any vision encoder at any resolution without encoder-specific training. Within the taxonomy, it resides in the 'Inference-Time Universal Upsamplers' leaf, which contains only two papers including this one. This represents a sparse research direction within the broader 'Universal Feature Upsampling Methods' branch, suggesting the work addresses a relatively underexplored problem space compared to encoder-specific or task-specific upsampling approaches that dominate other branches.

The taxonomy reveals that most upsampling research concentrates on encoder-specific methods (six papers across vision foundation models and transformer architectures) or task-specific approaches (fifteen papers spanning super-resolution, dense prediction, and domain-specific applications). The 'Trainable Universal Upsamplers' sibling branch contains three papers that require training on diverse features, whereas AnyUp's inference-time approach diverges by eliminating training requirements entirely. This positioning suggests the work bridges a gap between the flexibility of universal methods and the practicality of zero-shot deployment.

Among twenty candidates examined across three contributions, none were identified as clearly refuting the core claims. The main contribution 'AnyUp: feature-agnostic upsampling model' examined ten candidates with zero refutable matches, as did the 'Feature-agnostic layer' contribution. The 'Window attention architecture with crop-based training' was not evaluated against any candidates. Given this limited search scope of twenty papers from semantic search and citation expansion, the analysis suggests no immediate prior work overlap within the examined set, though the small candidate pool means substantial related work may exist beyond this sample.

Based on the limited literature search covering twenty candidates, the work appears to occupy a novel position within a sparse research direction. The taxonomy structure indicates that while universal upsampling is an established goal, inference-time approaches without training remain rare. However, the small search scope and the presence of only one sibling paper limit confidence in assessing broader field coverage or potential overlaps with work outside the examined candidates.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: universal feature upsampling across vision encoders and resolutions. The field addresses the challenge of recovering high-resolution spatial detail from coarse feature maps produced by diverse vision encoders, which is essential for dense prediction tasks such as segmentation and depth estimation. The taxonomy organizes approaches into three main branches. Universal Feature Upsampling Methods aim to develop encoder-agnostic techniques that generalize across different backbone architectures and input resolutions, often leveraging learned upsampling modules or implicit representations. Encoder-Specific Feature Upsampling tailors solutions to particular network families, exploiting architectural priors or training-time adaptations to achieve tighter integration with specific encoders. Task-Specific Resolution Enhancement focuses on domain-driven strategies, where upsampling is optimized for particular applications such as medical imaging, remote sensing, or video analysis, often incorporating task-relevant inductive biases. Recent work has explored trade-offs between generality and performance. Universal methods like FeatUp[18] and Upsample Anything[20] pursue broad applicability by training upsampling networks that can handle features from multiple encoders without retraining, while AnyUp[0] extends this paradigm by proposing an inference-time universal upsampler that adapts on-the-fly to unseen encoders and resolutions. This contrasts with encoder-specific approaches such as Cross Resolution Attention[1] or task-driven techniques like MGD-SAM2[5], which sacrifice some generality for tighter coupling to particular architectures or domains. A key open question is whether universal upsamplers can match the fidelity of specialized methods while maintaining their flexibility. AnyUp[0] sits within the inference-time universal branch alongside Upsample Anything[20], emphasizing zero-shot adaptability, whereas FeatUp[18] represents an earlier training-based universal approach that requires pre-training on a fixed set of encoders.

Claimed Contributions

AnyUp: feature-agnostic upsampling model

10 retrieved papers

AnyUp is a universal feature upsampling method that can be trained once and then applied to features from any vision encoder at any resolution without requiring encoder-specific retraining, unlike existing methods that must be retrained for each feature extractor.

10 retrieved papers

Feature-agnostic layer

10 retrieved papers

A convolutional layer design that processes input channels independently using a learned kernel basis and aggregates contributions across channels, enabling the model to handle features of arbitrary dimensionality while capturing structural information.

10 retrieved papers

Window attention architecture with crop-based training

0 retrieved papers

An upsampling architecture that restricts attention computation to local windows and employs a training strategy using randomly sampled image crops as supervision, combined with consistency regularization to preserve the original feature space.

0 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[20] Upsample Anything: A Simple and Hard to Beat Baseline for Feature Upsampling PDF

Minseok Seo, Mark Hamilton, Changick Kim (2025)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

AnyUp: feature-agnostic upsampling model

[3] FeatSharp: Your Vision Model Features, Sharper PDF

Cannot Refute

[35] Swin transformer v2: Scaling up capacity and resolution PDF

Cannot Refute

[36] Arbitrary-Scale Image Generation and Upsampling Using Latent Diffusion Model and Implicit Neural Decoder PDF

Cannot Refute

[37] Upsample guidance: Scale up diffusion models without training PDF

Cannot Refute

[38] Learned image downscaling for upscaling using content adaptive resampler PDF

Cannot Refute

[39] Recent advances in 2d image upscaling: a comprehensive review PDF

Cannot Refute

[40] U-repa: Aligning diffusion u-nets to vits PDF

Cannot Refute

[41] SFA-Net: Semantic Feature Adjustment Network for Remote Sensing Image Segmentation PDF

Cannot Refute

[42] Nas-fpn: Learning scalable feature pyramid architecture for object detection PDF

Cannot Refute

[43] Annotation-free open-vocabulary segmentation for remote-sensing images PDF

Cannot Refute

Contribution

Feature-agnostic layer

[25] Fully convolutional mesh autoencoder using efficient spatially varying kernels PDF

Cannot Refute

[26] Splinecnn: Fast geometric deep learning with continuous b-spline kernels PDF

Cannot Refute

[27] gWaveNet: Classification of gravity waves from noisy satellite data using custom kernel integrated deep learning method PDF

Cannot Refute

[28] Towards a General Purpose CNN for Long Range Dependencies in D PDF

Cannot Refute

[29] Convolutional kernel-based classification of industrial alarm floods PDF

Cannot Refute

[30] Omni-dimensional dynamic convolution feature coordinate attention network for pneumonia classification PDF

Cannot Refute

[31] Expert Kernel Generation Network Driven by Contextual Mapping for Hyperspectral Image Classification PDF

Cannot Refute

[32] Learning One Convolutional Layer with Overlapping Patches PDF

Cannot Refute

[33] Deep Network With Irregular Convolutional Kernels and Self-Expressive Property for Classification of Hyperspectral Images PDF

Cannot Refute

[34] K3DN: Disparity-Aware Kernel Estimation for Dual-Pixel Defocus Deblurring PDF

Cannot Refute

Contribution

AnyUp: Universal Feature Upsampling

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[20] Upsample Anything: A Simple and Hard to Beat Baseline for Feature Upsampling PDF

Contribution Analysis

AnyUp: feature-agnostic upsampling model

[3] FeatSharp: Your Vision Model Features, Sharper PDF

[35] Swin transformer v2: Scaling up capacity and resolution PDF

[36] Arbitrary-Scale Image Generation and Upsampling Using Latent Diffusion Model and Implicit Neural Decoder PDF

[37] Upsample guidance: Scale up diffusion models without training PDF

[38] Learned image downscaling for upscaling using content adaptive resampler PDF

[39] Recent advances in 2d image upscaling: a comprehensive review PDF

[40] U-repa: Aligning diffusion u-nets to vits PDF

[41] SFA-Net: Semantic Feature Adjustment Network for Remote Sensing Image Segmentation PDF

[42] Nas-fpn: Learning scalable feature pyramid architecture for object detection PDF

[43] Annotation-free open-vocabulary segmentation for remote-sensing images PDF

Feature-agnostic layer

[25] Fully convolutional mesh autoencoder using efficient spatially varying kernels PDF

[26] Splinecnn: Fast geometric deep learning with continuous b-spline kernels PDF

[27] gWaveNet: Classification of gravity waves from noisy satellite data using custom kernel integrated deep learning method PDF

[28] Towards a General Purpose CNN for Long Range Dependencies in D PDF

[29] Convolutional kernel-based classification of industrial alarm floods PDF

[30] Omni-dimensional dynamic convolution feature coordinate attention network for pneumonia classification PDF

[31] Expert Kernel Generation Network Driven by Contextual Mapping for Hyperspectral Image Classification PDF

[32] Learning One Convolutional Layer with Overlapping Patches PDF

[33] Deep Network With Irregular Convolutional Kernels and Self-Expressive Property for Classification of Hyperspectral Images PDF

[34] K3DN: Disparity-Aware Kernel Estimation for Dual-Pixel Defocus Deblurring PDF

Window attention architecture with crop-based training

Table of Contents