ViPO: Visual Preference Optimization at Scale
Overview
Overall Novelty Assessment
The paper proposes Poly-DPO, a polynomial extension to the DPO objective designed to handle noisy preference data, and introduces ViPO, a large-scale preference dataset comprising 1M image pairs and 300K video pairs. It resides in the Direct Preference Optimization Extensions leaf, which contains four papers including Diffusion DPO, DRAGON, and methods addressing multi-preference handling. This leaf sits within the broader Preference Optimization Algorithms and Objectives branch, indicating a moderately active research direction focused on adapting DPO-style frameworks to visual generation without explicit reward modeling.
The taxonomy reveals neighboring leaves addressing related challenges: Reinforcement Learning for Visual Generation explores policy-based methods, Multi-Reward and Multi-Objective Optimization tackles balancing multiple signals, and Hierarchical and Granular Preference Alignment organizes preferences across levels. The Preference Data Construction and Curation branch, particularly Synthetic and Automated Preference Data Generation, addresses dataset quality issues similar to ViPO's motivation. The scope notes clarify that this leaf excludes RL-based and reward-centric approaches, positioning the work as a direct optimization method rather than a policy gradient or reward model design contribution.
Among 21 candidates examined, the Poly-DPO algorithm shows no clear refutation (1 candidate examined, 0 refutable), suggesting limited prior work on polynomial confidence adjustments in DPO. The ViPO dataset contribution examined 10 candidates with 1 refutable match, indicating some overlap in large-scale preference data construction. The insight on conflicting preference patterns examined 10 candidates with no refutations, suggesting this framing may be relatively novel. The limited search scope means these findings reflect top-K semantic matches rather than exhaustive coverage of the field.
Based on the top-21 semantic matches examined, the algorithmic contribution appears less explored while the dataset contribution encounters more substantial prior work. The taxonomy structure shows this research direction is neither overcrowded nor sparse, with four sibling papers addressing related DPO extensions. The analysis captures immediate neighbors but does not cover the full landscape of visual preference optimization methods across all eight major branches.
Taxonomy
Research Landscape Overview
Claimed Contributions
The authors introduce Poly-DPO, an extension of Diffusion-DPO that adds a polynomial term to dynamically adjust sample weighting based on prediction confidence. This enables effective learning across diverse data distributions, from noisy datasets with conflicting preference patterns to trivially simple patterns.
The authors construct ViPO, a large-scale high-quality preference dataset containing 1M high-resolution image pairs across five quality dimensions and 300K video pairs across three categories. The dataset uses state-of-the-art generative models and systematic categorization to provide reliable and balanced preference signals.
The authors identify that conflicting preference patterns in existing datasets, where winner images excel in some dimensions but underperform in others, represent a fundamental obstacle to scaling visual preference optimization. They show that naively optimizing on such noisy datasets fails to learn meaningful preferences.
Core Task Comparisons
Comparisons with papers in the same taxonomy category
[1] Calibrated multi-preference optimization for aligning diffusion models PDF
[12] Diffusion Model Alignment Using Direct Preference Optimization PDF
[27] DRAGON: Distributional Rewards Optimize Diffusion Generative Models PDF
Contribution Analysis
Detailed comparisons for each claimed contribution
Poly-DPO optimization algorithm
The authors introduce Poly-DPO, an extension of Diffusion-DPO that adds a polynomial term to dynamically adjust sample weighting based on prediction confidence. This enables effective learning across diverse data distributions, from noisy datasets with conflicting preference patterns to trivially simple patterns.
[51] CAPO: Confidence Aware Preference Optimization Learning for Multilingual Preferences PDF
ViPO large-scale visual preference dataset
The authors construct ViPO, a large-scale high-quality preference dataset containing 1M high-resolution image pairs across five quality dimensions and 300K video pairs across three categories. The dataset uses state-of-the-art generative models and systematic categorization to provide reliable and balanced preference signals.
[54] Pick-a-pic: An open dataset of user preferences for text-to-image generation PDF
[19] Visionreward: Fine-grained multi-dimensional human preference learning for image and video generation PDF
[52] OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation PDF
[53] Vbench: Comprehensive benchmark suite for video generative models PDF
[55] Openhumanvid: A large-scale high-quality dataset for enhancing human-centric video generation PDF
[56] Evaluating Text-to-Visual Generation with Image-to-Text Generation PDF
[57] Learning multi-dimensional human preference for text-to-image generation PDF
[58] InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation PDF
[59] VidGen-1M: A Large-Scale Dataset for Text-to-video Generation PDF
[60] Videodpo: Omni-preference alignment for video diffusion generation PDF
Insight on conflicting preference patterns as scaling bottleneck
The authors identify that conflicting preference patterns in existing datasets, where winner images excel in some dimensions but underperform in others, represent a fundamental obstacle to scaling visual preference optimization. They show that naively optimizing on such noisy datasets fails to learn meaningful preferences.