AnyUp: Universal Feature Upsampling
Overview
Overall Novelty Assessment
The paper proposes AnyUp, an inference-time feature upsampling method designed to work with any vision encoder at any resolution without encoder-specific training. Within the taxonomy, it resides in the 'Inference-Time Universal Upsamplers' leaf, which contains only two papers including this one. This represents a sparse research direction within the broader 'Universal Feature Upsampling Methods' branch, suggesting the work addresses a relatively underexplored problem space compared to encoder-specific or task-specific upsampling approaches that dominate other branches.
The taxonomy reveals that most upsampling research concentrates on encoder-specific methods (six papers across vision foundation models and transformer architectures) or task-specific approaches (fifteen papers spanning super-resolution, dense prediction, and domain-specific applications). The 'Trainable Universal Upsamplers' sibling branch contains three papers that require training on diverse features, whereas AnyUp's inference-time approach diverges by eliminating training requirements entirely. This positioning suggests the work bridges a gap between the flexibility of universal methods and the practicality of zero-shot deployment.
Among twenty candidates examined across three contributions, none were identified as clearly refuting the core claims. The main contribution 'AnyUp: feature-agnostic upsampling model' examined ten candidates with zero refutable matches, as did the 'Feature-agnostic layer' contribution. The 'Window attention architecture with crop-based training' was not evaluated against any candidates. Given this limited search scope of twenty papers from semantic search and citation expansion, the analysis suggests no immediate prior work overlap within the examined set, though the small candidate pool means substantial related work may exist beyond this sample.
Based on the limited literature search covering twenty candidates, the work appears to occupy a novel position within a sparse research direction. The taxonomy structure indicates that while universal upsampling is an established goal, inference-time approaches without training remain rare. However, the small search scope and the presence of only one sibling paper limit confidence in assessing broader field coverage or potential overlaps with work outside the examined candidates.
Taxonomy
Research Landscape Overview
Claimed Contributions
AnyUp is a universal feature upsampling method that can be trained once and then applied to features from any vision encoder at any resolution without requiring encoder-specific retraining, unlike existing methods that must be retrained for each feature extractor.
A convolutional layer design that processes input channels independently using a learned kernel basis and aggregates contributions across channels, enabling the model to handle features of arbitrary dimensionality while capturing structural information.
An upsampling architecture that restricts attention computation to local windows and employs a training strategy using randomly sampled image crops as supervision, combined with consistency regularization to preserve the original feature space.
Core Task Comparisons
Comparisons with papers in the same taxonomy category
[20] Upsample Anything: A Simple and Hard to Beat Baseline for Feature Upsampling PDF
Contribution Analysis
Detailed comparisons for each claimed contribution
AnyUp: feature-agnostic upsampling model
AnyUp is a universal feature upsampling method that can be trained once and then applied to features from any vision encoder at any resolution without requiring encoder-specific retraining, unlike existing methods that must be retrained for each feature extractor.
[3] FeatSharp: Your Vision Model Features, Sharper PDF
[35] Swin transformer v2: Scaling up capacity and resolution PDF
[36] Arbitrary-Scale Image Generation and Upsampling Using Latent Diffusion Model and Implicit Neural Decoder PDF
[37] Upsample guidance: Scale up diffusion models without training PDF
[38] Learned image downscaling for upscaling using content adaptive resampler PDF
[39] Recent advances in 2d image upscaling: a comprehensive review PDF
[40] U-repa: Aligning diffusion u-nets to vits PDF
[41] SFA-Net: Semantic Feature Adjustment Network for Remote Sensing Image Segmentation PDF
[42] Nas-fpn: Learning scalable feature pyramid architecture for object detection PDF
[43] Annotation-free open-vocabulary segmentation for remote-sensing images PDF
Feature-agnostic layer
A convolutional layer design that processes input channels independently using a learned kernel basis and aggregates contributions across channels, enabling the model to handle features of arbitrary dimensionality while capturing structural information.
[25] Fully convolutional mesh autoencoder using efficient spatially varying kernels PDF
[26] Splinecnn: Fast geometric deep learning with continuous b-spline kernels PDF
[27] gWaveNet: Classification of gravity waves from noisy satellite data using custom kernel integrated deep learning method PDF
[28] Towards a General Purpose CNN for Long Range Dependencies in D PDF
[29] Convolutional kernel-based classification of industrial alarm floods PDF
[30] Omni-dimensional dynamic convolution feature coordinate attention network for pneumonia classification PDF
[31] Expert Kernel Generation Network Driven by Contextual Mapping for Hyperspectral Image Classification PDF
[32] Learning One Convolutional Layer with Overlapping Patches PDF
[33] Deep Network With Irregular Convolutional Kernels and Self-Expressive Property for Classification of Hyperspectral Images PDF
[34] K3DN: Disparity-Aware Kernel Estimation for Dual-Pixel Defocus Deblurring PDF
Window attention architecture with crop-based training
An upsampling architecture that restricts attention computation to local windows and employs a training strategy using randomly sampled image crops as supervision, combined with consistency regularization to preserve the original feature space.