Generative Value Conflicts Reveal LLM Priorities
Overview
Overall Novelty Assessment
The paper introduces ConflictScope, an automated pipeline for generating value conflict scenarios and evaluating how language models prioritize competing values. It sits within the 'Automated Conflict Scenario Generation' leaf of the taxonomy, which contains only two papers total. This is a relatively sparse research direction compared to more crowded areas like multi-objective reinforcement learning (three papers) or moral dilemma datasets (three papers). The work addresses a recognized gap in alignment datasets that lack sufficient value conflict scenarios, positioning itself as a methodological contribution to conflict evaluation infrastructure.
The taxonomy reveals that ConflictScope's nearest neighbors include manually curated moral dilemma datasets (AI Risk Dilemmas, DailyDilemmas, Moral Scenarios) and value prioritization evaluation protocols that measure ranking consistency and human-AI alignment. The automated generation approach contrasts with manual curation efforts, aiming for scalable coverage of diverse value combinations. The work also connects to inference-time alignment methods through its system prompting experiments, though it focuses on evaluation rather than developing new alignment techniques. The taxonomy's scope notes clarify that this leaf excludes manually curated datasets and pure evaluation protocols, emphasizing the generative automation aspect.
Among thirty candidates examined across three contributions, none were identified as clearly refuting the work's novelty. The ConflictScope pipeline contribution examined ten candidates with zero refutable matches, as did the open-ended evaluation method and the value ranking elicitation methodology. This suggests that within the limited search scope, the specific combination of automated scenario generation, open-ended response evaluation, and value ranking elicitation appears relatively unexplored. The finding that models shift from protective to personal values in open-ended settings, and that system prompting improves alignment by fourteen percent, represents empirical observations rather than methodological claims subject to direct refutation.
Based on the limited literature search of thirty semantically similar papers, the work appears to occupy a methodologically distinct position within value conflict evaluation. The analysis cannot assess whether larger-scale searches or domain-specific venues might reveal closer prior work. The taxonomy structure suggests this is an emerging research direction with room for methodological innovation, though the field overall shows substantial activity across related evaluation and alignment challenges.
Taxonomy
Research Landscape Overview
Claimed Contributions
The authors present ConflictScope, an automated system that generates realistic scenarios where language models face conflicts between pairs of values from a user-defined set, then evaluates model responses in open-ended settings to elicit value rankings. The pipeline includes scenario creation, filtering, and open-ended evaluation with simulated users.
The authors introduce an evaluation approach that moves beyond multiple-choice questioning by simulating realistic user interactions. An LLM generates user prompts based on scenario context, target models respond, and a judge LLM determines which value-aligned action was taken, enabling comparison of expressed versus revealed preferences.
The authors develop a method to aggregate model preferences across value conflict scenarios into complete value rankings using Bradley-Terry models, and demonstrate how system prompts can steer models toward target rankings with moderate success (14% improvement in alignment).
Core Task Comparisons
Comparisons with papers in the same taxonomy category
[12] Will AI Tell Lies to Save Sick Children? Litmus-Testing AI Values Prioritization with AIRiskDilemmas PDF
Contribution Analysis
Detailed comparisons for each claimed contribution
ConflictScope automated pipeline for value conflict scenario generation and evaluation
The authors present ConflictScope, an automated system that generates realistic scenarios where language models face conflicts between pairs of values from a user-defined set, then evaluates model responses in open-ended settings to elicit value rankings. The pipeline includes scenario creation, filtering, and open-ended evaluation with simulated users.
[61] Culturepark: Boosting cross-cultural understanding in large language models PDF
[62] The moral integrity corpus: A benchmark for ethical dialogue systems PDF
[63] Toward Value Scenario Generation Through Large Language Models PDF
[64] Is LLM a reliable reviewer? a comprehensive evaluation of LLM on automatic paper reviewing tasks PDF
[65] " Pull or Not to Pull?'': Investigating Moral Biases in Leading Large Language Models Across Ethical Dilemmas PDF
[66] Ethical reasoning over moral alignment: A case and framework for in-context ethical policies in LLMs PDF
[67] Ethical-Advice Taker: Do Language Models Understand Natural Language Interventions? PDF
[68] Measuring ethical behavior with AI and natural language processing to assess business success PDF
[69] Are LLMs complicated ethical dilemma analyzers? PDF
[70] Natural-Language Mediation Versus Numerical Aggregation in Multi-Stakeholder AI Governance: Capability Boundaries and Architectural Requirements PDF
Open-ended evaluation method using simulated user interaction
The authors introduce an evaluation approach that moves beyond multiple-choice questioning by simulating realistic user interactions. An LLM generates user prompts based on scenario context, target models respond, and a judge LLM determines which value-aligned action was taken, enabling comparison of expressed versus revealed preferences.
[51] Rethinking the evaluation for conversational recommendation in the era of large language models PDF
[52] Flipping the dialogue: Training and evaluating user language models PDF
[53] Out of One, Many: Using Language Models to Simulate Human Samples PDF
[54] The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery PDF
[55] Open-ended instructable embodied agents with memory-augmented large language models PDF
[56] Benchmarking Open-ended Audio Dialogue Understanding for Large Audio-Language Models PDF
[57] Using Large Language Models to Simulate Multiple Humans and Replicate Human Subject Studies PDF
[58] EvalLM: Interactive Evaluation of Large Language Model Prompts on User-Defined Criteria PDF
[59] An evaluation framework for clinical use of large language models in patient interaction tasks PDF
[60] Simulbench: Evaluating language models with creative simulation tasks PDF
Methodology for eliciting and steering value rankings from language models
The authors develop a method to aggregate model preferences across value conflict scenarios into complete value rankings using Bradley-Terry models, and demonstrate how system prompts can steer models toward target rankings with moderate success (14% improvement in alignment).