Generative Value Conflicts Reveal LLM Priorities

ICLR 2026 Conference SubmissionAnonymous Authors
LLM alignmentvalue alignmentevaluationmoral dilemmas
Abstract:

Past work seeks to align large language model (LLM)-based assistants with a target set of values, but such assistants are frequently forced to make tradeoffs between values when deployed. In response to the scarcity of value conflict in existing alignment datasets, we introduce ConflictScope, an automatic pipeline to evaluate how LLMs prioritize different values. Given a user-defined value set, ConflictScope automatically generates scenarios in which a language model faces a conflict between two values sampled from the set. It then prompts target models with an LLM-written ``user prompt'' and evaluates their free-text responses to elicit a ranking over values in the value set. Comparing results between multiple-choice and open-ended evaluations, we find that models shift away from supporting protective values, such as harmlessness, and toward supporting personal values, such as user autonomy, in more open-ended value conflict settings. However, including detailed value orderings in models' system prompts improves alignment with a target ranking by 14%, showing that system prompting can achieve moderate success at aligning LLM behavior under value conflict. Our work demonstrates the importance of evaluating value prioritization in models and provides a foundation for future work in this area.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces ConflictScope, an automated pipeline for generating value conflict scenarios and evaluating how language models prioritize competing values. It sits within the 'Automated Conflict Scenario Generation' leaf of the taxonomy, which contains only two papers total. This is a relatively sparse research direction compared to more crowded areas like multi-objective reinforcement learning (three papers) or moral dilemma datasets (three papers). The work addresses a recognized gap in alignment datasets that lack sufficient value conflict scenarios, positioning itself as a methodological contribution to conflict evaluation infrastructure.

The taxonomy reveals that ConflictScope's nearest neighbors include manually curated moral dilemma datasets (AI Risk Dilemmas, DailyDilemmas, Moral Scenarios) and value prioritization evaluation protocols that measure ranking consistency and human-AI alignment. The automated generation approach contrasts with manual curation efforts, aiming for scalable coverage of diverse value combinations. The work also connects to inference-time alignment methods through its system prompting experiments, though it focuses on evaluation rather than developing new alignment techniques. The taxonomy's scope notes clarify that this leaf excludes manually curated datasets and pure evaluation protocols, emphasizing the generative automation aspect.

Among thirty candidates examined across three contributions, none were identified as clearly refuting the work's novelty. The ConflictScope pipeline contribution examined ten candidates with zero refutable matches, as did the open-ended evaluation method and the value ranking elicitation methodology. This suggests that within the limited search scope, the specific combination of automated scenario generation, open-ended response evaluation, and value ranking elicitation appears relatively unexplored. The finding that models shift from protective to personal values in open-ended settings, and that system prompting improves alignment by fourteen percent, represents empirical observations rather than methodological claims subject to direct refutation.

Based on the limited literature search of thirty semantically similar papers, the work appears to occupy a methodologically distinct position within value conflict evaluation. The analysis cannot assess whether larger-scale searches or domain-specific venues might reveal closer prior work. The taxonomy structure suggests this is an emerging research direction with room for methodological innovation, though the field overall shows substantial activity across related evaluation and alignment challenges.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
30
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: evaluating language model value prioritization under conflict. The field addresses how language models navigate situations where multiple values or objectives cannot be simultaneously satisfied. The taxonomy organizes research into several major branches: Multi-Objective Alignment Methods develop techniques for balancing competing objectives during training (e.g., Multi Objective GRPO[13], Pareto Multi Objective[14]); Value Conflict Characterization and Evaluation Frameworks focus on defining, measuring, and generating scenarios where values clash (including works like AI Risk Dilemmas[12] and Generative Value Conflicts[0]); Domain-Specific Value Alignment examines conflicts in particular contexts such as cultural differences (Multi National Alignment[19]) or application areas; Specialized Alignment Challenges tackle issues like instruction hierarchies (Instruction Hierarchy[7]) and honesty-helpfulness trade-offs (Honesty Helpfulness Conflicts[6]); Supporting Resources provide datasets and methodologies (DailyDilemmas[22], Synthetic Moral Fables[23]); and Empirical Value Conflict Studies investigate how models actually behave when values compete (Privacy Prosocial Conflict[34], Right vs Right[35]). Several active research directions reveal key tensions in the field. One line explores whether conflicts can be resolved through better training objectives versus whether fundamental trade-offs are unavoidable (Fundamental Alignment Limitations[2], Safe RLHF[4]). Another examines how context should shape prioritization decisions (Contextual Value Alignment[9], Application Driven Alignment[3]). Generative Value Conflicts[0] sits within the Value Conflict Dataset Construction cluster, specifically focusing on automated conflict scenario generation. This work shares methodological kinship with AI Risk Dilemmas[12], which also constructs evaluative scenarios, but emphasizes generative approaches to produce diverse conflict cases at scale. Compared to manual curation efforts like DailyDilemmas[22], the automated generation strategy aims for broader coverage of the conflict space, though it faces distinct challenges in ensuring scenario realism and value representation fidelity.

Claimed Contributions

ConflictScope automated pipeline for value conflict scenario generation and evaluation

The authors present ConflictScope, an automated system that generates realistic scenarios where language models face conflicts between pairs of values from a user-defined set, then evaluates model responses in open-ended settings to elicit value rankings. The pipeline includes scenario creation, filtering, and open-ended evaluation with simulated users.

10 retrieved papers
Open-ended evaluation method using simulated user interaction

The authors introduce an evaluation approach that moves beyond multiple-choice questioning by simulating realistic user interactions. An LLM generates user prompts based on scenario context, target models respond, and a judge LLM determines which value-aligned action was taken, enabling comparison of expressed versus revealed preferences.

10 retrieved papers
Methodology for eliciting and steering value rankings from language models

The authors develop a method to aggregate model preferences across value conflict scenarios into complete value rankings using Bradley-Terry models, and demonstrate how system prompts can steer models toward target rankings with moderate success (14% improvement in alignment).

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

ConflictScope automated pipeline for value conflict scenario generation and evaluation

The authors present ConflictScope, an automated system that generates realistic scenarios where language models face conflicts between pairs of values from a user-defined set, then evaluates model responses in open-ended settings to elicit value rankings. The pipeline includes scenario creation, filtering, and open-ended evaluation with simulated users.

Contribution

Open-ended evaluation method using simulated user interaction

The authors introduce an evaluation approach that moves beyond multiple-choice questioning by simulating realistic user interactions. An LLM generates user prompts based on scenario context, target models respond, and a judge LLM determines which value-aligned action was taken, enabling comparison of expressed versus revealed preferences.

Contribution

Methodology for eliciting and steering value rankings from language models

The authors develop a method to aggregate model preferences across value conflict scenarios into complete value rankings using Bradley-Terry models, and demonstrate how system prompts can steer models toward target rankings with moderate success (14% improvement in alignment).