Data Selection for LLM Alignment Using Fine-Grained Preferences

ICLR 2026 Conference SubmissionAnonymous Authors
Data SelectionPreference Alignment
Abstract:

Large language models (LLMs) alignment aims to ensure that the behavior of LLMs meets human preferences. While collecting data from multiple fine-grained, aspect-specific preferences becomes more and more feasible, existing alignment methods typically work on a single preference and thus struggle with conflicts inherent in such aggregated datasets. As one early attempt, in this paper, we propose a data-centric approach to align LLMs through the effective use of fine-grained preferences. Specifically, we formulate the problem as a direct fine-grained preference optimization and introduce preference divergence (PD) that quantifies inter-aspect preference conflicts. Instead of directly tackling the consequent complicated optimization, we recast it as a data selection problem and propose a simple yet effective strategy, which identifies a subset of data corresponding to the most negative PD values, for efficient training. We theoretically analyze the loss-bound optimality of our selection strategy and conduct extensive empirical studies on varied settings and datasets to demonstrate that our practical selection method could achieve consistent improvement against standard full-data alignment, using even just 30% of the data. Our work shares a line that LLM alignment using fine-grained preferences is highly feasible.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes a data-centric approach to align large language models using fine-grained, aspect-specific preferences, introducing a preference divergence metric to quantify inter-aspect conflicts and a data selection strategy to mitigate them. It resides in the Token-Level and Sentence-Level Preference Alignment leaf, which contains four papers total, including the original work. This leaf sits within the broader Fine-Grained Preference Modeling and Optimization branch, indicating a moderately populated research direction focused on granular preference signals rather than coarse response-level feedback.

The taxonomy reveals that the paper's immediate neighbors include Multi-Aspect Preference Alignment (two papers) and several data-centric branches under Data Selection and Curation for Alignment, such as Quality-Based Data Selection Strategies (six papers) and Synthetic and Automated Preference Data Construction (five papers). The scope note for the original leaf excludes aspect-based or multi-dimensional decomposition, yet the paper explicitly addresses multi-aspect conflicts, suggesting it bridges token-level granularity with multi-aspect reasoning. This positioning places it at the boundary between fine-grained modeling and data curation, connecting algorithmic refinement with strategic dataset construction.

Among thirty candidates examined, the analysis identified limited prior work overlap. The first contribution—formulating direct fine-grained preference optimization and introducing preference divergence—was refuted by one candidate out of ten examined, indicating some conceptual precedent in the limited search scope. The second contribution—data selection based on preference divergence with theoretical guarantees—found no refutable candidates among ten examined, suggesting relative novelty within the sampled literature. The third contribution—empirical validation of efficiency gains—encountered one refutable candidate among ten, implying that efficiency-focused evaluations with reduced data have appeared in prior work, though the specific combination with preference divergence may differ.

Based on the limited search scope of thirty semantically similar candidates, the work appears to occupy a niche intersection of fine-grained preference modeling and data selection, with the preference divergence-driven selection strategy showing the least prior overlap. The analysis does not cover exhaustive citation networks or domain-specific venues, so additional related work may exist beyond the top-K semantic matches examined here.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
30
Contribution Candidate Papers Compared
2
Refutable Paper

Research Landscape Overview

Core task: Aligning large language models using fine-grained preferences with data selection. The field has evolved into several interconnected branches that address complementary aspects of preference-based alignment. Fine-Grained Preference Modeling and Optimization focuses on methods that move beyond response-level feedback to capture token-level or sentence-level signals, enabling more precise control over model behavior through techniques like Mask-DPO[17] and Fine-Grained Supervision[1]. Data Selection and Curation for Alignment emphasizes the quality and composition of training data, with works such as Good Data Alignment[12] and Clean Data Curation[13] demonstrating that careful filtering and sampling strategies can substantially improve alignment outcomes. Preference Optimization Frameworks encompasses algorithmic innovations that refine the core optimization process, including methods like Distribution Preference Optimization[31] and Noise Contrastive Alignment[32]. Domain-Specific and Multimodal Alignment extends these techniques to specialized settings, from vision-language models like Vision-R1[5] and Align2LLaVA[11] to biomedical applications such as Biomedical Clinician Preference[29]. Finally, Personalized and Individualized Alignment explores how to tailor models to diverse user preferences, as seen in Individual Preferences Interaction[3] and LifeAlign[6]. Recent work has increasingly recognized that granularity and data quality are tightly coupled challenges. While many studies pursue finer-grained supervision signals to reduce credit assignment problems, others highlight that even sophisticated optimization methods can falter without high-quality preference data, as explored in Preference Noise Impact[27] and Ambiguous Preference Pairs[36]. Fine-Grained Preferences[0] sits at the intersection of token-level modeling and strategic data selection, closely aligned with neighbors like Fine-Grained Supervision[1] and Selective Preference Optimization[50]. Compared to Mask-DPO[17], which focuses primarily on masking mechanisms for token-level credit assignment, Fine-Grained Preferences[0] integrates data selection criteria to ensure that fine-grained signals are drawn from informative examples. This dual emphasis distinguishes it from purely algorithmic refinements and positions it within an emerging cluster of methods that treat preference granularity and data curation as mutually reinforcing design choices.

Claimed Contributions

Direct fine-grained preference optimization formulation and preference divergence metric

The authors formulate a direct fine-grained preference optimization (DFPO) objective that extends DPO to handle multiple fine-grained preference aspects. They introduce preference divergence (PD) as a metric to quantify conflicts between different aspect-specific preferences in aggregated datasets.

10 retrieved papers
Can Refute
Data selection method based on preference divergence with theoretical guarantees

The authors recast the optimization problem as a data selection task and propose selecting samples with the most negative PD values for training. They provide theoretical analysis showing loss-bound optimality of this selection strategy.

10 retrieved papers
Empirical validation demonstrating efficiency gains with reduced data

The authors conduct comprehensive experiments across multiple settings and datasets, demonstrating that their method achieves superior performance compared to full-data alignment while using only 30% of the data, validating the feasibility of alignment with fine-grained preferences.

10 retrieved papers
Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Direct fine-grained preference optimization formulation and preference divergence metric

The authors formulate a direct fine-grained preference optimization (DFPO) objective that extends DPO to handle multiple fine-grained preference aspects. They introduce preference divergence (PD) as a metric to quantify conflicts between different aspect-specific preferences in aggregated datasets.

Contribution

Data selection method based on preference divergence with theoretical guarantees

The authors recast the optimization problem as a data selection task and propose selecting samples with the most negative PD values for training. They provide theoretical analysis showing loss-bound optimality of this selection strategy.

Contribution

Empirical validation demonstrating efficiency gains with reduced data

The authors conduct comprehensive experiments across multiple settings and datasets, demonstrating that their method achieves superior performance compared to full-data alignment while using only 30% of the data, validating the feasibility of alignment with fine-grained preferences.