Omni-Reward: Towards Generalist Omni-Modal Reward Modeling with Free-Form Preferences
Overview
Taxonomy
Research Landscape Overview
Claimed Contributions
The authors introduce a comprehensive benchmark for evaluating reward models across five modalities (text, image, video, audio, 3D) covering nine tasks with 3,725 human-annotated preference pairs. The benchmark uniquely incorporates free-form preference descriptions rather than fixed binary preferences, enabling evaluation of RMs under diverse user-specified criteria.
The authors build a large-scale multimodal preference dataset that combines general preference pairs from existing sources with newly collected instruction-tuning data. This dataset enables reward models to generalize across modalities and dynamically align with diverse user preferences expressed in natural language.
The authors develop two types of reward models: a discriminative model trained with Bradley-Terry loss and a generative model trained with reinforcement learning that produces explicit reasoning. These models demonstrate significant improvements on the proposed benchmark and achieve performance comparable to or exceeding state-of-the-art on public benchmarks.
Core Task Comparisons
Comparisons with papers in the same taxonomy category
[12] Internlm-xcomposer2. 5-reward: A simple yet effective multi-modal reward model PDF
[19] Unified Reward Model for Multimodal Understanding and Generation PDF
[29] BaseReward: A Strong Baseline for Multimodal Reward Model PDF
[35] Skywork-VL Reward: An Effective Reward Model for Multimodal Understanding and Reasoning PDF
Contribution Analysis
Detailed comparisons for each claimed contribution
Omni-RewardBench: first omni-modal RM benchmark with free-form preferences
The authors introduce a comprehensive benchmark for evaluating reward models across five modalities (text, image, video, audio, 3D) covering nine tasks with 3,725 human-annotated preference pairs. The benchmark uniquely incorporates free-form preference descriptions rather than fixed binary preferences, enabling evaluation of RMs under diverse user-specified criteria.
[50] Llava-critic: Learning to evaluate multimodal models PDF
[51] Chip: Cross-modal hierarchical direct preference optimization for multimodal llms PDF
[52] Q-bench: A benchmark for multi-modal foundation models on low-level vision from single images to pairs PDF
[53] VideoRewardBench: Comprehensive Evaluation of Multimodal Reward Models for Video Understanding PDF
[54] MJ-bench: Is your multimodal reward model really a good judge for text-to-image generation? PDF
[55] Structured preference modeling for reinforcement learning-based fine-tuning of large models PDF
[56] Mixed-r1: Unified reward perspective for reasoning capability in multimodal large language models PDF
[57] MCM-DPO: Multifaceted Cross-Modal Direct Preference Optimization for Alt-text Generation PDF
[58] Guiding Cross-Modal Representations with MLLM Priors via Preference Alignment PDF
[59] A RAG Approach for Multi-Modal Open-ended Lifelog Question-Answering PDF
Omni-RewardData: multimodal preference dataset with instruction-tuning pairs
The authors build a large-scale multimodal preference dataset that combines general preference pairs from existing sources with newly collected instruction-tuning data. This dataset enables reward models to generalize across modalities and dynamically align with diverse user preferences expressed in natural language.
[47] Vlfeedback: A large-scale ai feedback dataset for large vision-language models alignment PDF
[12] Internlm-xcomposer2. 5-reward: A simple yet effective multi-modal reward model PDF
[33] M3PO: Multimodal-Model-Guided Preference Optimization for Visual Instruction Following PDF
[42] Aligning large multimodal models with factually augmented rlhf PDF
[43] Multi-modal preference alignment remedies degradation of visual instruction tuning on language models PDF
[44] Tuning large multimodal models for videos using reinforcement learning from ai feedback PDF
[45] Instruction-Oriented Preference Alignment for Enhancing Multi-Modal Comprehension Capability of MLLMs PDF
[46] Align2llava: Cascaded human and large language model preference alignment for multi-modal instruction curation PDF
[48] Multimodal large language model is a human-aligned annotator for text-to-image generation PDF
[49] Mitigating Hallucinations in Large Vision-Language Models via Entity-Centric Multimodal Preference Optimization PDF
Omni-RewardModel: discriminative and generative omni-modal reward models
The authors develop two types of reward models: a discriminative model trained with Bradley-Terry loss and a generative model trained with reinforcement learning that produces explicit reasoning. These models demonstrate significant improvements on the proposed benchmark and achieve performance comparable to or exceeding state-of-the-art on public benchmarks.