PoliCon: Evaluating LLMs on Achieving Diverse Political Consensus Objectives
Overview
Overall Novelty Assessment
The paper introduces PoliCon, a benchmark for evaluating large language models on drafting political consensus resolutions from divergent party positions using European Parliament deliberation records. It resides in the 'LLM-Based Consensus Resolution Drafting' leaf, which contains only two papers total including this work. This represents a sparse, emerging research direction within the broader computational consensus generation branch, suggesting the application of LLMs to political resolution drafting is still in early stages of exploration.
The taxonomy reveals neighboring work in computational belief estimation and polarization analysis, which focuses on detecting agreement points from social media rather than generating resolution text. The broader field includes empirical deliberation analysis examining real-world consensus mechanisms and party position measurement studies that quantify ideological stances. PoliCon bridges computational generation with empirical political science by grounding its tasks in actual parliamentary data, distinguishing it from purely theoretical frameworks or human-centered deliberation studies excluded from its immediate branch.
Among thirty candidates examined, the PoliCon benchmark contribution shows one refutable candidate from ten examined, while the social choice theory evaluation framework found no clear refutations across ten candidates. The data collection procedure also identified one overlapping prior work among ten examined. These statistics suggest moderate prior work exists for the benchmark and data aspects, while the evaluation framework appears more distinctive within this limited search scope. The relatively small candidate pool indicates this assessment captures top semantic matches rather than exhaustive coverage.
Given the sparse taxonomy leaf and limited literature search scope, the work appears to occupy relatively novel ground in applying LLMs to political consensus drafting. However, the presence of refutable candidates for two of three contributions indicates some foundational elements have precedent. The analysis reflects top-thirty semantic matches and does not claim comprehensive field coverage, leaving open the possibility of additional relevant work outside this search boundary.
Taxonomy
Research Landscape Overview
Claimed Contributions
The authors construct PoliCon, a benchmark built from 2,225 European Parliament deliberation records spanning 2009-2022. It evaluates LLMs' ability to generate consensus resolutions under diverse collective decision-making contexts by incorporating four adjustable factors: political issues, political goals, participating parties, and power structures based on seat distribution.
The authors develop an open-ended evaluation framework grounded in social choice theory that first simulates voting outcomes for each political party using an LLM-as-a-judge approach, then maps these votes to quantitative scores based on specific political consensus objectives such as simple majority, two-thirds majority, veto power, Rawlsianism, and Utilitarianism.
The authors perform comprehensive data collection from multiple sources (European Parliament website, HowTheyVote, VoteWatch Europe dataset) and apply extensive cleaning procedures using DeepSeek-R1 and rule-based methods. This produces integrated sextuples of issue, topic, background, stances, resolution, and votes for each parliamentary record.
Core Task Comparisons
Comparisons with papers in the same taxonomy category
[2] EuroCon: Benchmarking Parliament Deliberation for Political Consensus Finding PDF
Contribution Analysis
Detailed comparisons for each claimed contribution
PoliCon benchmark for evaluating LLMs on political consensus
The authors construct PoliCon, a benchmark built from 2,225 European Parliament deliberation records spanning 2009-2022. It evaluates LLMs' ability to generate consensus resolutions under diverse collective decision-making contexts by incorporating four adjustable factors: political issues, political goals, participating parties, and power structures based on seat distribution.
[2] EuroCon: Benchmarking Parliament Deliberation for Political Consensus Finding PDF
[21] How large language models can reshape collective intelligence PDF
[22] How does AI compare to the experts in a Delphi setting: simulating medical consensus with large language models. PDF
[23] Temperature and persona shape llm agent consensus with minimal accuracy gains in qualitative coding PDF
[24] Towards Deliberating Agents: Evaluating the Ability of Large Language Models to Deliberate PDF
[25] Towards Insider Summarization for Mediation Instead of Moderation: Examining Wikipedian Views on Key Elements of Discussion Summaries PDF
[26] Piloting ModiBot: A Large Language Model-Based Moderator in Normal and Emotionally Challenging Focus Group Interactions PDF
[27] CivicParse: A Benchmark and Pipeline for Structured Online Deliberation PDF
[28] 9 Artificial intelligence in deliberative democracy PDF
[29] A Climatology-specific Large Language Model using Ensemble Projection Data toward Regional Climate Services PDF
Evaluation framework based on social choice theory
The authors develop an open-ended evaluation framework grounded in social choice theory that first simulates voting outcomes for each political party using an LLM-as-a-judge approach, then maps these votes to quantitative scores based on specific political consensus objectives such as simple majority, two-thirds majority, veto power, Rawlsianism, and Utilitarianism.
[30] The dynamics of strategic voting: pathways to consensus and gridlock PDF
[31] Multi-agent Opinion Pooling by Voting for Bins: Simulations and Characterization PDF
[32] Social influence and consensus building: Introducing a q-voter model with weighted influence PDF
[33] An Uncertainty- and Collusion-Proof Voting Consensus Mechanism in Blockchain PDF
[34] Society Protect: Developing A Voting Mechanism for Blockchain-based AI Alignment PDF
[35] Toward decentralization in DPoS systems: Election, voting, and leader selection using virtual stake PDF
[36] Weighted voting on the blockchain: Improving consensus in proof of stake protocols PDF
[37] Multi-agent Opinion Pooling by Voting PDF
[38] Next-Generation Relay Voting Scheme Design Leveraging Consensus Algorithms PDF
[39] Deliberative democracy and social choice PDF
Large-scale data collection and cleaning procedure
The authors perform comprehensive data collection from multiple sources (European Parliament website, HowTheyVote, VoteWatch Europe dataset) and apply extensive cleaning procedures using DeepSeek-R1 and rule-based methods. This produces integrated sextuples of issue, topic, background, stances, resolution, and votes for each parliamentary record.