Data Selection for LLM Alignment Using Fine-Grained Preferences

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

Data SelectionPreference Alignment

Large language models (LLMs) alignment aims to ensure that the behavior of LLMs meets human preferences. While collecting data from multiple fine-grained, aspect-specific preferences becomes more and more feasible, existing alignment methods typically work on a single preference and thus struggle with conflicts inherent in such aggregated datasets. As one early attempt, in this paper, we propose a data-centric approach to align LLMs through the effective use of fine-grained preferences. Specifically, we formulate the problem as a direct fine-grained preference optimization and introduce preference divergence (PD) that quantifies inter-aspect preference conflicts. Instead of directly tackling the consequent complicated optimization, we recast it as a data selection problem and propose a simple yet effective strategy, which identifies a subset of data corresponding to the most negative PD values, for efficient training. We theoretically analyze the loss-bound optimality of our selection strategy and conduct extensive empirical studies on varied settings and datasets to demonstrate that our practical selection method could achieve consistent improvement against standard full-data alignment, using even just 30% of the data. Our work shares a line that LLM alignment using fine-grained preferences is highly feasible.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes a data-centric approach to align large language models using fine-grained, aspect-specific preferences, introducing a preference divergence metric to quantify inter-aspect conflicts and a data selection strategy to mitigate them. It resides in the Token-Level and Sentence-Level Preference Alignment leaf, which contains four papers total, including the original work. This leaf sits within the broader Fine-Grained Preference Modeling and Optimization branch, indicating a moderately populated research direction focused on granular preference signals rather than coarse response-level feedback.

The taxonomy reveals that the paper's immediate neighbors include Multi-Aspect Preference Alignment (two papers) and several data-centric branches under Data Selection and Curation for Alignment, such as Quality-Based Data Selection Strategies (six papers) and Synthetic and Automated Preference Data Construction (five papers). The scope note for the original leaf excludes aspect-based or multi-dimensional decomposition, yet the paper explicitly addresses multi-aspect conflicts, suggesting it bridges token-level granularity with multi-aspect reasoning. This positioning places it at the boundary between fine-grained modeling and data curation, connecting algorithmic refinement with strategic dataset construction.

Among thirty candidates examined, the analysis identified limited prior work overlap. The first contribution—formulating direct fine-grained preference optimization and introducing preference divergence—was refuted by one candidate out of ten examined, indicating some conceptual precedent in the limited search scope. The second contribution—data selection based on preference divergence with theoretical guarantees—found no refutable candidates among ten examined, suggesting relative novelty within the sampled literature. The third contribution—empirical validation of efficiency gains—encountered one refutable candidate among ten, implying that efficiency-focused evaluations with reduced data have appeared in prior work, though the specific combination with preference divergence may differ.

Based on the limited search scope of thirty semantically similar candidates, the work appears to occupy a niche intersection of fine-grained preference modeling and data selection, with the preference divergence-driven selection strategy showing the least prior overlap. The analysis does not cover exhaustive citation networks or domain-specific venues, so additional related work may exist beyond the top-K semantic matches examined here.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Aligning large language models using fine-grained preferences with data selection. The field has evolved into several interconnected branches that address complementary aspects of preference-based alignment. Fine-Grained Preference Modeling and Optimization focuses on methods that move beyond response-level feedback to capture token-level or sentence-level signals, enabling more precise control over model behavior through techniques like Mask-DPO[17] and Fine-Grained Supervision[1]. Data Selection and Curation for Alignment emphasizes the quality and composition of training data, with works such as Good Data Alignment[12] and Clean Data Curation[13] demonstrating that careful filtering and sampling strategies can substantially improve alignment outcomes. Preference Optimization Frameworks encompasses algorithmic innovations that refine the core optimization process, including methods like Distribution Preference Optimization[31] and Noise Contrastive Alignment[32]. Domain-Specific and Multimodal Alignment extends these techniques to specialized settings, from vision-language models like Vision-R1[5] and Align2LLaVA[11] to biomedical applications such as Biomedical Clinician Preference[29]. Finally, Personalized and Individualized Alignment explores how to tailor models to diverse user preferences, as seen in Individual Preferences Interaction[3] and LifeAlign[6]. Recent work has increasingly recognized that granularity and data quality are tightly coupled challenges. While many studies pursue finer-grained supervision signals to reduce credit assignment problems, others highlight that even sophisticated optimization methods can falter without high-quality preference data, as explored in Preference Noise Impact[27] and Ambiguous Preference Pairs[36]. Fine-Grained Preferences[0] sits at the intersection of token-level modeling and strategic data selection, closely aligned with neighbors like Fine-Grained Supervision[1] and Selective Preference Optimization[50]. Compared to Mask-DPO[17], which focuses primarily on masking mechanisms for token-level credit assignment, Fine-Grained Preferences[0] integrates data selection criteria to ensure that fine-grained signals are drawn from informative examples. This dual emphasis distinguishes it from purely algorithmic refinements and positions it within an emerging cluster of methods that treat preference granularity and data curation as mutually reinforcing design choices.

Claimed Contributions

Direct fine-grained preference optimization formulation and preference divergence metric

Can Refute

10 retrieved papers

The authors formulate a direct fine-grained preference optimization (DFPO) objective that extends DPO to handle multiple fine-grained preference aspects. They introduce preference divergence (PD) as a metric to quantify conflicts between different aspect-specific preferences in aggregated datasets.

10 retrieved papers

Can Refute

Data selection method based on preference divergence with theoretical guarantees

10 retrieved papers

The authors recast the optimization problem as a data selection task and propose selecting samples with the most negative PD values for training. They provide theoretical analysis showing loss-bound optimality of this selection strategy.

10 retrieved papers

Empirical validation demonstrating efficiency gains with reduced data

Can Refute

10 retrieved papers

The authors conduct comprehensive experiments across multiple settings and datasets, demonstrating that their method achieves superior performance compared to full-data alignment while using only 30% of the data, validating the feasibility of alignment with fine-grained preferences.

10 retrieved papers

Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[1] Aligning large language models via fine-grained supervision PDF

Do Jaeyoung, Kim, Minseok, Ladhak, Faisal, Qiu, Liang, Xu Dehong (2024)

[17] Mask-DPO: Generalizable Fine-grained Factuality Alignment of LLMs PDF

Gu, Yuzhe, Zhang, Wenwei, Yuzhe Gu, Lyu, Chengqi, Wenwei Zhang, Lin, Dahua, Chengqi Lyu, Chen Kai, Dahua Lin, Kai Chen (2025)

[50] Selective Preference Optimization via Token-Level Reward Function Estimation PDF

Kailai Yang, Zhiwei Liu, Qianqian Xie, Jimin Huang, Erxue Min, Sophia Ananiadou (2024) • Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Direct fine-grained preference optimization formulation and preference divergence metric

[52] Beyond One-Preference-for-All: Multi-Objective Direct Preference Optimization for Language Models PDF

Can Refute

[51] Improving alignment of dialogue agents via targeted human judgements PDF

Cannot Refute

[53] Panacea: Pareto alignment via preference adaptation for llms PDF

Cannot Refute

[54] Multi-Objective Preference Optimization: Improving Human Alignment of Generative Models PDF

Cannot Refute

[55] Assessing and Mitigating Medical Knowledge Drift and Conflicts in Large Language Models PDF

Cannot Refute

[56] MAVIS: Multi-Objective Alignment via Value-Guided Inference-Time Search PDF

Cannot Refute

[57] Elevating sentiment analysis with resilient grey wolf optimization-based Gaussian-enhanced quantum deep neural networks in online shopping PDF

Cannot Refute

[58] Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models PDF

Cannot Refute

[59] ContraSolver: Self-Alignment of Language Models by Resolving Internal Preference Contradictions PDF

Cannot Refute

[60] Optimizing Safe and Aligned Language Generation: A Multi-Objective GRPO Approach PDF

Cannot Refute

Contribution

Data selection method based on preference divergence with theoretical guarantees

[70] Bayesian Active Learning for Classification and Preference Learning PDF

Cannot Refute

[71] Finding the Sweet Spot: Preference Data Construction for Scaling Preference Optimization PDF

Cannot Refute

[72] Sharpe Ratio-Guided Active Learning for Preference Optimization in RLHF PDF

Cannot Refute

[73] On the role of preference variance in preference optimization PDF

Cannot Refute

[74] ActiveDPO: Active Direct Preference Optimization for Sample-Efficient Alignment PDF

Cannot Refute

[75] Active preference-based Gaussian process regression for reward learning and optimization PDF

Cannot Refute

[76] Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-constraint PDF

Cannot Refute

[77] Transitive inference as probabilistic preference learning PDF

Cannot Refute

[78] Language model preference evaluation with multiple weak evaluators PDF

Cannot Refute

[79] Data Optimization for LLMs: A Survey PDF

Cannot Refute

Contribution

Empirical validation demonstrating efficiency gains with reduced data

[12] What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning PDF

Can Refute

[61] Instruction tuning for large language models: A survey PDF

Cannot Refute

[62] Lima: Less is more for alignment PDF

Cannot Refute

[63] Specializing Smaller Language Models towards Multi-Step Reasoning PDF

Cannot Refute

[64] Tallrec: An effective and efficient tuning framework to align large language model with recommendation PDF

Cannot Refute

[65] Alignbench: Benchmarking chinese alignment of large language models PDF

Cannot Refute

[66] Distilling Reasoning Capabilities into Smaller Language Models PDF

Cannot Refute

[67] Shadow Alignment: The Ease of Subverting Safely-Aligned Language Models PDF

Cannot Refute

[68] Language models are few-shot learners PDF

Cannot Refute

[69] Constructive Large Language Models Alignment with Diverse Feedback PDF

Cannot Refute

Data Selection for LLM Alignment Using Fine-Grained Preferences

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[1] Aligning large language models via fine-grained supervision PDF

[17] Mask-DPO: Generalizable Fine-grained Factuality Alignment of LLMs PDF

[50] Selective Preference Optimization via Token-Level Reward Function Estimation PDF

Contribution Analysis

Direct fine-grained preference optimization formulation and preference divergence metric

[52] Beyond One-Preference-for-All: Multi-Objective Direct Preference Optimization for Language Models PDF

[51] Improving alignment of dialogue agents via targeted human judgements PDF

[53] Panacea: Pareto alignment via preference adaptation for llms PDF

[54] Multi-Objective Preference Optimization: Improving Human Alignment of Generative Models PDF

[55] Assessing and Mitigating Medical Knowledge Drift and Conflicts in Large Language Models PDF

[56] MAVIS: Multi-Objective Alignment via Value-Guided Inference-Time Search PDF

[57] Elevating sentiment analysis with resilient grey wolf optimization-based Gaussian-enhanced quantum deep neural networks in online shopping PDF

[58] Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models PDF

[59] ContraSolver: Self-Alignment of Language Models by Resolving Internal Preference Contradictions PDF

[60] Optimizing Safe and Aligned Language Generation: A Multi-Objective GRPO Approach PDF

Data selection method based on preference divergence with theoretical guarantees

[70] Bayesian Active Learning for Classification and Preference Learning PDF

[71] Finding the Sweet Spot: Preference Data Construction for Scaling Preference Optimization PDF

[72] Sharpe Ratio-Guided Active Learning for Preference Optimization in RLHF PDF

[73] On the role of preference variance in preference optimization PDF

[74] ActiveDPO: Active Direct Preference Optimization for Sample-Efficient Alignment PDF

[75] Active preference-based Gaussian process regression for reward learning and optimization PDF

[76] Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-constraint PDF

[77] Transitive inference as probabilistic preference learning PDF

[78] Language model preference evaluation with multiple weak evaluators PDF

[79] Data Optimization for LLMs: A Survey PDF

Empirical validation demonstrating efficiency gains with reduced data

[12] What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning PDF

[61] Instruction tuning for large language models: A survey PDF

[62] Lima: Less is more for alignment PDF

[63] Specializing Smaller Language Models towards Multi-Step Reasoning PDF

[64] Tallrec: An effective and efficient tuning framework to align large language model with recommendation PDF

[65] Alignbench: Benchmarking chinese alignment of large language models PDF

[66] Distilling Reasoning Capabilities into Smaller Language Models PDF

[67] Shadow Alignment: The Ease of Subverting Safely-Aligned Language Models PDF

[68] Language models are few-shot learners PDF

[69] Constructive Large Language Models Alignment with Diverse Feedback PDF

Table of Contents