ELViS: Efficient Visual Similarity from Local Descriptors that Generalizes Across Domains

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

instance-level image retrievalimage re-rankinglocal similaritygeneralizationinterpretability

Large-scale instance-level training data is scarce, so models are typically trained on domain-specific datasets. Yet in real-world retrieval, they must handle diverse domains, making generalization to unseen data critical. We introduce ELViS, an image-to-image similarity model that generalizes effectively to unseen domains. Unlike conventional approaches, our model operates in similarity space rather than representation space, promoting cross-domain transfer. It leverages local descriptor correspondences, refines their similarities through an optimal transport step with data-dependent gains that suppress uninformative descriptors, and aggregates strong correspondences via a voting process into an image-level similarity. This design injects strong inductive biases, yielding a simple, efficient, and interpretable model. To assess generalization, we compile a benchmark of eight datasets spanning landmarks, artworks, products, and multi-domain collections, and evaluate ELViS as a re-ranking method. Our experiments show that ELViS outperforms competing methods by a large margin in out-of-domain scenarios and on average, while requiring only a fraction of their computational cost.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces ELViS, a similarity-based re-ranking model for cross-domain image retrieval that operates in similarity space rather than representation space. It sits within the 'Cross-Domain Correspondence via Deep Features' leaf of the taxonomy, which contains four papers total including ELViS. This leaf is part of the broader 'Correspondence Establishment and Matching' branch, indicating a moderately populated research direction focused on leveraging deep neural network features to establish correspondences across visually distinct domains. The taxonomy shows this is an active but not overcrowded area, with sibling papers exploring related correspondence mechanisms using deep features.

The taxonomy reveals neighboring research directions that contextualize ELViS's position. Adjacent leaves include 'Semantic-Guided Correspondence' (2 papers) and 'Geometric and Appearance-Based Matching' (1 paper), both addressing correspondence establishment through different mechanisms. The broader 'Domain Adaptation and Transfer Learning' branch (7 papers across three leaves) tackles domain shift through feature alignment rather than correspondence reasoning. ELViS diverges from these by emphasizing similarity-space operations and optimal transport refinement, connecting conceptually to correspondence-based methods while introducing a distinct architectural approach that prioritizes interpretability and efficiency over pure feature alignment.

Among 30 candidates examined, the contribution-level analysis reveals mixed novelty signals. The core ELViS re-ranking model (Contribution 1) examined 10 candidates with zero refutations, suggesting relative novelty in its similarity-space formulation. However, the optimal transport refinement with descriptor-dependent gains (Contribution 2) found 2 refutable candidates among 10 examined, indicating some overlap with prior work on correspondence refinement techniques. The cross-domain generalization benchmark (Contribution 3) showed no refutations across 10 candidates, suggesting this evaluation framework addresses a gap in existing benchmarks. The limited search scope means these findings reflect top-30 semantic matches rather than exhaustive coverage.

Based on the limited literature search, ELViS appears to offer meaningful contributions in similarity-space modeling and benchmark construction, while its optimal transport component shows more substantial prior work. The taxonomy context suggests the paper occupies a moderately explored niche within correspondence-based retrieval, with room for differentiation through its specific design choices. The analysis covers top-30 semantic matches and does not claim exhaustive field coverage, leaving open the possibility of additional related work in less semantically similar papers or specialized venues.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: cross-domain image retrieval using local descriptor correspondences. The field addresses the challenge of matching images across domains where appearance, viewpoint, or modality differ substantially. The taxonomy organizes research into five main branches: Local Feature Extraction and Descriptor Learning focuses on designing robust low-level representations that remain discriminative under domain shifts; Correspondence Establishment and Matching develops algorithms to align features between domains, often leveraging deep networks to capture semantic similarities; Domain Adaptation and Transfer Learning applies techniques to bridge distributional gaps, enabling models trained on one domain to generalize to another; Cross-Domain Retrieval Applications targets specific scenarios such as sketch-to-photo retrieval, day-night localization, or underwater-to-surface matching; and Generalization and Zero-Shot Learning explores methods that handle unseen domain combinations without retraining. Representative works like Few-shot Cross-domain[1] and DomainFeat[7] illustrate how adaptation strategies can be combined with local feature reasoning to improve retrieval robustness. Within Correspondence Establishment and Matching, a particularly active line of work uses deep features to establish cross-domain correspondences, balancing semantic richness with spatial precision. Neural Best-Buddies[13] pioneered the use of internal CNN activations to find semantically consistent matches across large appearance gaps, while Cross-domain Feature Maps[16] extended this idea to handle more diverse domain shifts. ELViS[0] sits naturally within this cluster, emphasizing local descriptor correspondences via deep features to achieve robust cross-domain retrieval. Compared to GrownBB[22], which refines correspondence hierarchies, ELViS[0] focuses on leveraging local descriptors more directly for retrieval tasks. Meanwhile, methods like LNIFT[3] and Underwater Correspondence[4] address specialized domain pairs, highlighting the trade-off between general-purpose correspondence frameworks and domain-specific tuning. The central open question remains how to balance the expressiveness of learned deep features with the geometric consistency required for reliable cross-domain matching.

Claimed Contributions

ELViS: similarity-based re-ranking model for cross-domain image retrieval

10 retrieved papers

The authors propose ELViS, a novel image-to-image similarity model that operates on local descriptor correspondences rather than raw descriptors. The model refines similarity matrices using optimal transport with descriptor-dependent dustbin gains and aggregates strong correspondences through a learnable voting mechanism, achieving better generalization to unseen domains than prior descriptor-based methods.

10 retrieved papers

Optimal transport refinement with descriptor-dependent dustbin gains

Can Refute

10 retrieved papers

The authors introduce a variant of optimal transport that uses descriptor-dependent dustbin gains (computed via a learned function h) to suppress uninformative or background descriptors. This refinement step produces a double-stochastic similarity matrix that emphasizes mutually consistent strong correspondences while discarding distracting descriptors.

10 retrieved papers

Can Refute

Cross-domain generalization benchmark for instance-level retrieval

10 retrieved papers

The authors compile a benchmarking protocol unifying eight existing datasets across diverse domains (landmarks, household items, retail products, artworks, and multi-domain sets) and introduce an evaluation framework that distinguishes in-domain and out-of-domain test sets. This is presented as the first extensive evaluation of single-source domain generalization in instance-level retrieval.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[13] Neural best-buddies: Sparse cross-domain correspondence PDF

Kfir Aberman, Jing Liao, Mingyi Shi, Dani Lischinski, Baoquan Chen, Daniel CohenâOr (2018)

[16] Cross-domain image matching with deep feature maps PDF

Kong, Bailey, Ramanan, Deva, Fowlkes, Charless C. (2019)

[22] Grownbb: GromovâWasserstein learning of neural best buddies for cross-domain correspondence PDF

Ruolan Tang, Weiwei Wang, Yu Han, Xiangchu Feng, Yu, Han (2024) • The Visual Computer

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

ELViS: similarity-based re-ranking model for cross-domain image retrieval

[9] Cross-domain image retrieval: methods and applications PDF

Cannot Refute

[21] Multi-level domain adaptive learning for cross-domain detection PDF

Cannot Refute

[46] Omniglue: Generalizable feature matching with foundation model guidance PDF

Cannot Refute

[47] Bridging the domain gap for ground-to-aerial image matching PDF

Cannot Refute

[48] Cross-domain image retrieval with a dual attribute-aware ranking network PDF

Cannot Refute

[49] Pixel matching network for cross-domain few-shot segmentation PDF

Cannot Refute

[50] A Cross-View Image Matching Method with Feature Enhancement PDF

Cannot Refute

[51] Cvm-net: Cross-view matching network for image-based ground-to-aerial geo-localization PDF

Cannot Refute

[52] Sketch-based shape retrieval via best view selection and a cross-domain similarity measure PDF

Cannot Refute

[53] Siamese transformer network-based similarity metric learning for cross-source remote sensing image retrieval PDF

Cannot Refute

Contribution

Optimal transport refinement with descriptor-dependent dustbin gains

[54] Optimal transport aggregation for visual place recognition PDF

Can Refute

[60] Superglue: Learning feature matching with graph neural networks PDF

Can Refute

[55] Optimal transport for transfer learning across spaces PDF

Cannot Refute

[56] Semantic correspondence as an optimal transport problem PDF

Cannot Refute

[57] AOT: Aggregation Optimal Transport for Few-Shot SAR Automatic Target Recognition PDF

Cannot Refute

[58] Feature Robust Optimal Transport for High-dimensional Data PDF

Cannot Refute

[59] The Self-Optimal-Transport Feature Transform PDF

Cannot Refute

[61] Frame Order Matters: A Temporal Sequence-Aware Model for Few-Shot Action Recognition PDF

Cannot Refute

[62] Recover and Match: Open-Vocabulary Multi-Label Recognition through Knowledge-Constrained Optimal Transport PDF

Cannot Refute

[63] Deep Shells: Unsupervised Shape Correspondence with Optimal Transport PDF

Cannot Refute

Contribution

Cross-domain generalization benchmark for instance-level retrieval

[64] Coir: A comprehensive benchmark for code information retrieval models PDF

Cannot Refute

[65] Domain generalization with mixstyle PDF

Cannot Refute

[66] BIRB: A Generalization Benchmark for Information Retrieval in Bioacoustics PDF

Cannot Refute

[67] FGPR: A large-scale dataset and benchmark for fine-grained product retrieval PDF

Cannot Refute

[68] Uniir: Training and benchmarking universal multimodal information retrievers PDF

Cannot Refute

[69] Meta batch-instance normalization for generalizable person re-identification PDF

Cannot Refute

[70] Rar-b: Reasoning as retrieval benchmark PDF

Cannot Refute

[71] Towards Cognition-Aligned Visual Language Models viaZero-Shot Instance Retrieval PDF

Cannot Refute

[72] FORB: a flat object retrieval benchmark for universal image embedding PDF

Cannot Refute

[73] GPR1200: a benchmark for general-purpose content-based image retrieval PDF

Cannot Refute

ELViS: Efficient Visual Similarity from Local Descriptors that Generalizes Across Domains

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[13] Neural best-buddies: Sparse cross-domain correspondence PDF

[16] Cross-domain image matching with deep feature maps PDF

[22] Grownbb: GromovâWasserstein learning of neural best buddies for cross-domain correspondence PDF

Contribution Analysis

ELViS: similarity-based re-ranking model for cross-domain image retrieval

[9] Cross-domain image retrieval: methods and applications PDF

[21] Multi-level domain adaptive learning for cross-domain detection PDF

[46] Omniglue: Generalizable feature matching with foundation model guidance PDF

[47] Bridging the domain gap for ground-to-aerial image matching PDF

[48] Cross-domain image retrieval with a dual attribute-aware ranking network PDF

[49] Pixel matching network for cross-domain few-shot segmentation PDF

[50] A Cross-View Image Matching Method with Feature Enhancement PDF

[51] Cvm-net: Cross-view matching network for image-based ground-to-aerial geo-localization PDF

[52] Sketch-based shape retrieval via best view selection and a cross-domain similarity measure PDF

[53] Siamese transformer network-based similarity metric learning for cross-source remote sensing image retrieval PDF

Optimal transport refinement with descriptor-dependent dustbin gains

[54] Optimal transport aggregation for visual place recognition PDF

[60] Superglue: Learning feature matching with graph neural networks PDF

[55] Optimal transport for transfer learning across spaces PDF

[56] Semantic correspondence as an optimal transport problem PDF

[57] AOT: Aggregation Optimal Transport for Few-Shot SAR Automatic Target Recognition PDF

[58] Feature Robust Optimal Transport for High-dimensional Data PDF

[59] The Self-Optimal-Transport Feature Transform PDF

[61] Frame Order Matters: A Temporal Sequence-Aware Model for Few-Shot Action Recognition PDF

[62] Recover and Match: Open-Vocabulary Multi-Label Recognition through Knowledge-Constrained Optimal Transport PDF

[63] Deep Shells: Unsupervised Shape Correspondence with Optimal Transport PDF

Cross-domain generalization benchmark for instance-level retrieval

[64] Coir: A comprehensive benchmark for code information retrieval models PDF

[65] Domain generalization with mixstyle PDF

[66] BIRB: A Generalization Benchmark for Information Retrieval in Bioacoustics PDF

[67] FGPR: A large-scale dataset and benchmark for fine-grained product retrieval PDF

[68] Uniir: Training and benchmarking universal multimodal information retrievers PDF

[69] Meta batch-instance normalization for generalizable person re-identification PDF

[70] Rar-b: Reasoning as retrieval benchmark PDF

[71] Towards Cognition-Aligned Visual Language Models viaZero-Shot Instance Retrieval PDF

[72] FORB: a flat object retrieval benchmark for universal image embedding PDF

[73] GPR1200: a benchmark for general-purpose content-based image retrieval PDF

Table of Contents

[22] Grownbb: GromovâWasserstein learning of neural best buddies for cross-domain correspondence PDF