ELViS: Efficient Visual Similarity from Local Descriptors that Generalizes Across Domains

ICLR 2026 Conference SubmissionAnonymous Authors
instance-level image retrievalimage re-rankinglocal similaritygeneralizationinterpretability
Abstract:

Large-scale instance-level training data is scarce, so models are typically trained on domain-specific datasets. Yet in real-world retrieval, they must handle diverse domains, making generalization to unseen data critical. We introduce ELViS, an image-to-image similarity model that generalizes effectively to unseen domains. Unlike conventional approaches, our model operates in similarity space rather than representation space, promoting cross-domain transfer. It leverages local descriptor correspondences, refines their similarities through an optimal transport step with data-dependent gains that suppress uninformative descriptors, and aggregates strong correspondences via a voting process into an image-level similarity. This design injects strong inductive biases, yielding a simple, efficient, and interpretable model. To assess generalization, we compile a benchmark of eight datasets spanning landmarks, artworks, products, and multi-domain collections, and evaluate ELViS as a re-ranking method. Our experiments show that ELViS outperforms competing methods by a large margin in out-of-domain scenarios and on average, while requiring only a fraction of their computational cost.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces ELViS, a similarity-based re-ranking model for cross-domain image retrieval that operates in similarity space rather than representation space. It sits within the 'Cross-Domain Correspondence via Deep Features' leaf of the taxonomy, which contains four papers total including ELViS. This leaf is part of the broader 'Correspondence Establishment and Matching' branch, indicating a moderately populated research direction focused on leveraging deep neural network features to establish correspondences across visually distinct domains. The taxonomy shows this is an active but not overcrowded area, with sibling papers exploring related correspondence mechanisms using deep features.

The taxonomy reveals neighboring research directions that contextualize ELViS's position. Adjacent leaves include 'Semantic-Guided Correspondence' (2 papers) and 'Geometric and Appearance-Based Matching' (1 paper), both addressing correspondence establishment through different mechanisms. The broader 'Domain Adaptation and Transfer Learning' branch (7 papers across three leaves) tackles domain shift through feature alignment rather than correspondence reasoning. ELViS diverges from these by emphasizing similarity-space operations and optimal transport refinement, connecting conceptually to correspondence-based methods while introducing a distinct architectural approach that prioritizes interpretability and efficiency over pure feature alignment.

Among 30 candidates examined, the contribution-level analysis reveals mixed novelty signals. The core ELViS re-ranking model (Contribution 1) examined 10 candidates with zero refutations, suggesting relative novelty in its similarity-space formulation. However, the optimal transport refinement with descriptor-dependent gains (Contribution 2) found 2 refutable candidates among 10 examined, indicating some overlap with prior work on correspondence refinement techniques. The cross-domain generalization benchmark (Contribution 3) showed no refutations across 10 candidates, suggesting this evaluation framework addresses a gap in existing benchmarks. The limited search scope means these findings reflect top-30 semantic matches rather than exhaustive coverage.

Based on the limited literature search, ELViS appears to offer meaningful contributions in similarity-space modeling and benchmark construction, while its optimal transport component shows more substantial prior work. The taxonomy context suggests the paper occupies a moderately explored niche within correspondence-based retrieval, with room for differentiation through its specific design choices. The analysis covers top-30 semantic matches and does not claim exhaustive field coverage, leaving open the possibility of additional related work in less semantically similar papers or specialized venues.

Taxonomy

Core-task Taxonomy Papers
45
3
Claimed Contributions
30
Contribution Candidate Papers Compared
2
Refutable Paper

Research Landscape Overview

Core task: cross-domain image retrieval using local descriptor correspondences. The field addresses the challenge of matching images across domains where appearance, viewpoint, or modality differ substantially. The taxonomy organizes research into five main branches: Local Feature Extraction and Descriptor Learning focuses on designing robust low-level representations that remain discriminative under domain shifts; Correspondence Establishment and Matching develops algorithms to align features between domains, often leveraging deep networks to capture semantic similarities; Domain Adaptation and Transfer Learning applies techniques to bridge distributional gaps, enabling models trained on one domain to generalize to another; Cross-Domain Retrieval Applications targets specific scenarios such as sketch-to-photo retrieval, day-night localization, or underwater-to-surface matching; and Generalization and Zero-Shot Learning explores methods that handle unseen domain combinations without retraining. Representative works like Few-shot Cross-domain[1] and DomainFeat[7] illustrate how adaptation strategies can be combined with local feature reasoning to improve retrieval robustness. Within Correspondence Establishment and Matching, a particularly active line of work uses deep features to establish cross-domain correspondences, balancing semantic richness with spatial precision. Neural Best-Buddies[13] pioneered the use of internal CNN activations to find semantically consistent matches across large appearance gaps, while Cross-domain Feature Maps[16] extended this idea to handle more diverse domain shifts. ELViS[0] sits naturally within this cluster, emphasizing local descriptor correspondences via deep features to achieve robust cross-domain retrieval. Compared to GrownBB[22], which refines correspondence hierarchies, ELViS[0] focuses on leveraging local descriptors more directly for retrieval tasks. Meanwhile, methods like LNIFT[3] and Underwater Correspondence[4] address specialized domain pairs, highlighting the trade-off between general-purpose correspondence frameworks and domain-specific tuning. The central open question remains how to balance the expressiveness of learned deep features with the geometric consistency required for reliable cross-domain matching.

Claimed Contributions

ELViS: similarity-based re-ranking model for cross-domain image retrieval

The authors propose ELViS, a novel image-to-image similarity model that operates on local descriptor correspondences rather than raw descriptors. The model refines similarity matrices using optimal transport with descriptor-dependent dustbin gains and aggregates strong correspondences through a learnable voting mechanism, achieving better generalization to unseen domains than prior descriptor-based methods.

10 retrieved papers
Optimal transport refinement with descriptor-dependent dustbin gains

The authors introduce a variant of optimal transport that uses descriptor-dependent dustbin gains (computed via a learned function h) to suppress uninformative or background descriptors. This refinement step produces a double-stochastic similarity matrix that emphasizes mutually consistent strong correspondences while discarding distracting descriptors.

10 retrieved papers
Can Refute
Cross-domain generalization benchmark for instance-level retrieval

The authors compile a benchmarking protocol unifying eight existing datasets across diverse domains (landmarks, household items, retail products, artworks, and multi-domain sets) and introduce an evaluation framework that distinguishes in-domain and out-of-domain test sets. This is presented as the first extensive evaluation of single-source domain generalization in instance-level retrieval.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

ELViS: similarity-based re-ranking model for cross-domain image retrieval

The authors propose ELViS, a novel image-to-image similarity model that operates on local descriptor correspondences rather than raw descriptors. The model refines similarity matrices using optimal transport with descriptor-dependent dustbin gains and aggregates strong correspondences through a learnable voting mechanism, achieving better generalization to unseen domains than prior descriptor-based methods.

Contribution

Optimal transport refinement with descriptor-dependent dustbin gains

The authors introduce a variant of optimal transport that uses descriptor-dependent dustbin gains (computed via a learned function h) to suppress uninformative or background descriptors. This refinement step produces a double-stochastic similarity matrix that emphasizes mutually consistent strong correspondences while discarding distracting descriptors.

Contribution

Cross-domain generalization benchmark for instance-level retrieval

The authors compile a benchmarking protocol unifying eight existing datasets across diverse domains (landmarks, household items, retail products, artworks, and multi-domain sets) and introduce an evaluation framework that distinguishes in-domain and out-of-domain test sets. This is presented as the first extensive evaluation of single-source domain generalization in instance-level retrieval.