Abstract:

In information retrieval, training reranking models mainly focuses on two types of objectives: metric learning (e.g. contrastive loss to increase the predicted scores on relevant query-document pairs) and classification (binary label prediction of relevance vs. irrelevance). For BERT-style encoders, various studies have shown that contrastive learning (CL) can be more effective than discriminative (classification) learning. However, for large language models (LLMs), classification via supervised fine-tuning (SFT), which predicts ''yes'' (resp. ''no'') token for relevant (resp. irrelevant) pairs, appears more promising as it aligns well with the generative nature of LLMs. This divergence raises a central question: which objective is intrinsically better suited to LLM-based reranking, and what mechanism underlies the difference? In this work, we conduct a comprehensive comparison and analysis between CL and SFT for reranking, taking the universal multimodal retrieval (UMR) as the experimental playground. We first decompose the objectives into two components: weight, which controls the magnitude of those updates, and direction, which guides the model updates, then present a unified framework for understanding their interactions. Through probing experiments, we find that SFT provides a substantially stronger weighting scheme than CL, whereas the preferred scoring direction shows no clear winner. Taken together, these results point to a consistent advantage of SFT over CL for LLM reranking. To further validate our findings, we conduct large-scale training with SFT and present new state-of-the-art rerankers on the MRB benchmark. We also provide ablations on SFT settings and expect our findings to benefit future research and applications in this area.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper contributes a unified framework for analyzing contrastive learning versus supervised fine-tuning in LLM-based reranking, alongside a multimodal reranking benchmark (MRB) and state-of-the-art GMR models. It resides in the 'Contrastive vs. Supervised Fine-Tuning Objectives' leaf, which contains only two papers total (including this one). This leaf sits within the broader 'Training Objective Design and Comparison' branch, indicating a relatively sparse research direction focused specifically on direct objective comparisons for LLM rerankers. The taxonomy reveals that while training objective design is an active area, head-to-head comparisons of CL versus SFT remain underexplored.

The taxonomy shows neighboring leaves addressing reinforcement learning hybrids, specialized loss functions, and alternative supervision sources. The 'Reinforcement Learning and Hybrid Training Approaches' leaf explores multi-objective optimization, while 'Specialized Loss Functions for Reranking' examines novel loss designs for ranking errors. The 'Alternative Supervision Signals' leaf investigates LLM annotations versus click data. The original paper diverges from these by focusing on foundational objective comparison rather than hybrid methods or supervision sources, and by extending the analysis to multimodal retrieval contexts where text and vision signals interact.

Among 24 candidates examined, the unified framework contribution (4 candidates, 0 refutable) appears relatively novel, with no clear prior work decomposing objectives into weight and direction components for LLM reranking. The MRB benchmark contribution (10 candidates, 1 refutable) shows more overlap, suggesting existing multimodal evaluation resources may partially cover this ground. The GMR models contribution (10 candidates, 0 refutable) appears novel in achieving state-of-the-art multimodal reranking performance. The limited search scope means these assessments reflect top-30 semantic matches and immediate citations, not exhaustive field coverage.

Given the sparse taxonomy leaf and limited refutation signals, the work appears to occupy a relatively underexplored niche at the intersection of objective comparison and multimodal reranking. The analysis is constrained by the 24-candidate search scope and may not capture all relevant prior work in adjacent areas like BERT-based objective studies or broader multimodal retrieval benchmarks. The framework and model contributions show stronger novelty signals than the benchmark component within this limited examination.

Taxonomy

Core-task Taxonomy Papers
8
3
Claimed Contributions
24
Contribution Candidate Papers Compared
1
Refutable Paper

Research Landscape Overview

Core task: Comparing training objectives for large language model based reranking. The field structure suggested by the taxonomy reveals three main branches that organize research around how LLM-based rerankers are trained and refined. The first branch, Training Objective Design and Comparison, examines the choice and formulation of loss functions—particularly contrasting contrastive learning approaches with supervised fine-tuning strategies—and explores how different objectives shape model behavior. Works such as Rethink BERT Rerankers[2] and FIRST[3] illustrate early efforts to understand which training signals best capture relevance. The second branch, Training Data and Supervision Sources, addresses where supervision comes from, including human annotations, click logs, and synthetic labels generated by LLMs themselves, as seen in LLM Annotations Replace Clicks[6]. The third branch, Bias Mitigation and Robustness, focuses on ensuring that rerankers generalize fairly across diverse queries and resist spurious correlations or position biases. Several active lines of work highlight key trade-offs and open questions. One central theme is whether contrastive objectives—which encourage models to distinguish relevant from irrelevant passages—offer advantages over pointwise or listwise supervised fine-tuning, especially when training data is noisy or limited. Another theme concerns the integration of multimodal signals and the use of LLM-generated annotations to reduce reliance on expensive human labels. Within this landscape, Multimodal LLM Reranking[0] sits naturally alongside efforts like Multi-view Passage Reranking[1] and ERank[4], which also explore richer input representations and alternative training regimes. Compared to Rethink BERT Rerankers[2], which revisited foundational BERT-based objectives, and LLM Reranking Survey[5], which provides a broader overview, the original paper emphasizes the comparative evaluation of objectives in a multimodal setting, bridging objective design with emerging data modalities.

Claimed Contributions

Unified framework for analyzing SFT and CL in LLM reranking

The authors develop a unified framework (URL) that decomposes reranking loss functions into weight and direction components, enabling systematic comparison between supervised fine-tuning and contrastive learning. Through this decomposition, they demonstrate that SFT's superior performance stems primarily from its weight component, which provides stronger optimization signals than CL.

4 retrieved papers
MRB benchmark for multimodal reranking evaluation

The authors construct MRB (multimodal reranking benchmark), a comprehensive evaluation benchmark containing 40 test datasets spanning diverse modalities including single-modal, cross-modal, and fused-modal retrieval tasks. This benchmark enables rigorous assessment of universal multimodal reranking models across different domains and task types.

10 retrieved papers
Can Refute
GMR models achieving state-of-the-art multimodal reranking

The authors develop GMR-3B and GMR-7B, instruction-aware multimodal LLM rerankers trained using supervised fine-tuning on approximately 1.5 million diverse query-document pairs. These models establish new state-of-the-art performance on the MRB benchmark, demonstrating the practical effectiveness of their SFT-based approach for universal multimodal reranking.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Unified framework for analyzing SFT and CL in LLM reranking

The authors develop a unified framework (URL) that decomposes reranking loss functions into weight and direction components, enabling systematic comparison between supervised fine-tuning and contrastive learning. Through this decomposition, they demonstrate that SFT's superior performance stems primarily from its weight component, which provides stronger optimization signals than CL.

Contribution

MRB benchmark for multimodal reranking evaluation

The authors construct MRB (multimodal reranking benchmark), a comprehensive evaluation benchmark containing 40 test datasets spanning diverse modalities including single-modal, cross-modal, and fused-modal retrieval tasks. This benchmark enables rigorous assessment of universal multimodal reranking models across different domains and task types.

Contribution

GMR models achieving state-of-the-art multimodal reranking

The authors develop GMR-3B and GMR-7B, instruction-aware multimodal LLM rerankers trained using supervised fine-tuning on approximately 1.5 million diverse query-document pairs. These models establish new state-of-the-art performance on the MRB benchmark, demonstrating the practical effectiveness of their SFT-based approach for universal multimodal reranking.