Data Selection for LLM Alignment Using Fine-Grained Preferences
Overview
Overall Novelty Assessment
The paper proposes a data-centric approach to align large language models using fine-grained, aspect-specific preferences, introducing a preference divergence metric to quantify inter-aspect conflicts and a data selection strategy to mitigate them. It resides in the Token-Level and Sentence-Level Preference Alignment leaf, which contains four papers total, including the original work. This leaf sits within the broader Fine-Grained Preference Modeling and Optimization branch, indicating a moderately populated research direction focused on granular preference signals rather than coarse response-level feedback.
The taxonomy reveals that the paper's immediate neighbors include Multi-Aspect Preference Alignment (two papers) and several data-centric branches under Data Selection and Curation for Alignment, such as Quality-Based Data Selection Strategies (six papers) and Synthetic and Automated Preference Data Construction (five papers). The scope note for the original leaf excludes aspect-based or multi-dimensional decomposition, yet the paper explicitly addresses multi-aspect conflicts, suggesting it bridges token-level granularity with multi-aspect reasoning. This positioning places it at the boundary between fine-grained modeling and data curation, connecting algorithmic refinement with strategic dataset construction.
Among thirty candidates examined, the analysis identified limited prior work overlap. The first contribution—formulating direct fine-grained preference optimization and introducing preference divergence—was refuted by one candidate out of ten examined, indicating some conceptual precedent in the limited search scope. The second contribution—data selection based on preference divergence with theoretical guarantees—found no refutable candidates among ten examined, suggesting relative novelty within the sampled literature. The third contribution—empirical validation of efficiency gains—encountered one refutable candidate among ten, implying that efficiency-focused evaluations with reduced data have appeared in prior work, though the specific combination with preference divergence may differ.
Based on the limited search scope of thirty semantically similar candidates, the work appears to occupy a niche intersection of fine-grained preference modeling and data selection, with the preference divergence-driven selection strategy showing the least prior overlap. The analysis does not cover exhaustive citation networks or domain-specific venues, so additional related work may exist beyond the top-K semantic matches examined here.
Taxonomy
Research Landscape Overview
Claimed Contributions
The authors formulate a direct fine-grained preference optimization (DFPO) objective that extends DPO to handle multiple fine-grained preference aspects. They introduce preference divergence (PD) as a metric to quantify conflicts between different aspect-specific preferences in aggregated datasets.
The authors recast the optimization problem as a data selection task and propose selecting samples with the most negative PD values for training. They provide theoretical analysis showing loss-bound optimality of this selection strategy.
The authors conduct comprehensive experiments across multiple settings and datasets, demonstrating that their method achieves superior performance compared to full-data alignment while using only 30% of the data, validating the feasibility of alignment with fine-grained preferences.
Core Task Comparisons
Comparisons with papers in the same taxonomy category
[1] Aligning large language models via fine-grained supervision PDF
[17] Mask-DPO: Generalizable Fine-grained Factuality Alignment of LLMs PDF
[50] Selective Preference Optimization via Token-Level Reward Function Estimation PDF
Contribution Analysis
Detailed comparisons for each claimed contribution
Direct fine-grained preference optimization formulation and preference divergence metric
The authors formulate a direct fine-grained preference optimization (DFPO) objective that extends DPO to handle multiple fine-grained preference aspects. They introduce preference divergence (PD) as a metric to quantify conflicts between different aspect-specific preferences in aggregated datasets.
[52] Beyond One-Preference-for-All: Multi-Objective Direct Preference Optimization for Language Models PDF
[51] Improving alignment of dialogue agents via targeted human judgements PDF
[53] Panacea: Pareto alignment via preference adaptation for llms PDF
[54] Multi-Objective Preference Optimization: Improving Human Alignment of Generative Models PDF
[55] Assessing and Mitigating Medical Knowledge Drift and Conflicts in Large Language Models PDF
[56] MAVIS: Multi-Objective Alignment via Value-Guided Inference-Time Search PDF
[57] Elevating sentiment analysis with resilient grey wolf optimization-based Gaussian-enhanced quantum deep neural networks in online shopping PDF
[58] Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models PDF
[59] ContraSolver: Self-Alignment of Language Models by Resolving Internal Preference Contradictions PDF
[60] Optimizing Safe and Aligned Language Generation: A Multi-Objective GRPO Approach PDF
Data selection method based on preference divergence with theoretical guarantees
The authors recast the optimization problem as a data selection task and propose selecting samples with the most negative PD values for training. They provide theoretical analysis showing loss-bound optimality of this selection strategy.
[70] Bayesian Active Learning for Classification and Preference Learning PDF
[71] Finding the Sweet Spot: Preference Data Construction for Scaling Preference Optimization PDF
[72] Sharpe Ratio-Guided Active Learning for Preference Optimization in RLHF PDF
[73] On the role of preference variance in preference optimization PDF
[74] ActiveDPO: Active Direct Preference Optimization for Sample-Efficient Alignment PDF
[75] Active preference-based Gaussian process regression for reward learning and optimization PDF
[76] Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-constraint PDF
[77] Transitive inference as probabilistic preference learning PDF
[78] Language model preference evaluation with multiple weak evaluators PDF
[79] Data Optimization for LLMs: A Survey PDF
Empirical validation demonstrating efficiency gains with reduced data
The authors conduct comprehensive experiments across multiple settings and datasets, demonstrating that their method achieves superior performance compared to full-data alignment while using only 30% of the data, validating the feasibility of alignment with fine-grained preferences.