PoliCon: Evaluating LLMs on Achieving Diverse Political Consensus Objectives

ICLR 2026 Conference SubmissionAnonymous Authors
Political Consensus FindingCollective Decision MakingBenchmark on Societal Considerations
Abstract:

Achieving political consensus is crucial yet challenging for the effective functioning of social governance. However, although frontier AI systems represented by large language models (LLMs) have developed rapidly in recent years, their capabilities in this scope are still understudied. In this paper, we introduce PoliCon, a novel benchmark constructed from 2,225 high-quality deliberation records of the European Parliament over 13 years, ranging from 2009 to 2022, to evaluate the ability of LLMs to draft consensus resolutions based on divergent party positions under varying collective decision-making contexts and political requirements. Specifically, PoliCon incorporates four factors to build each task environment for finding different political consensus: specific political issues, political goals, participating parties, and power structures based on seat distribution. We also developed an evaluation framework based on social choice theory for PoliCon, which simulates the real voting outcomes of different political parties to assess whether LLM-generated resolutions meet the requirements of the predetermined political consensus. Our experimental results demonstrate that even state-of-the-art models remain undersatisfied with complex tasks like passing resolutions by a two-thirds majority and addressing security issues, while uncovering their inherent partisan biases and revealing some behaviors LLMs show to achieve the consensus, such as prioritizing the stance of the dominant party instead of uniting smaller parties, which highlights PoliCon's promise as an effective platform for studying LLMs' ability to promote political consensus.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces PoliCon, a benchmark for evaluating large language models on drafting political consensus resolutions from divergent party positions using European Parliament deliberation records. It resides in the 'LLM-Based Consensus Resolution Drafting' leaf, which contains only two papers total including this work. This represents a sparse, emerging research direction within the broader computational consensus generation branch, suggesting the application of LLMs to political resolution drafting is still in early stages of exploration.

The taxonomy reveals neighboring work in computational belief estimation and polarization analysis, which focuses on detecting agreement points from social media rather than generating resolution text. The broader field includes empirical deliberation analysis examining real-world consensus mechanisms and party position measurement studies that quantify ideological stances. PoliCon bridges computational generation with empirical political science by grounding its tasks in actual parliamentary data, distinguishing it from purely theoretical frameworks or human-centered deliberation studies excluded from its immediate branch.

Among thirty candidates examined, the PoliCon benchmark contribution shows one refutable candidate from ten examined, while the social choice theory evaluation framework found no clear refutations across ten candidates. The data collection procedure also identified one overlapping prior work among ten examined. These statistics suggest moderate prior work exists for the benchmark and data aspects, while the evaluation framework appears more distinctive within this limited search scope. The relatively small candidate pool indicates this assessment captures top semantic matches rather than exhaustive coverage.

Given the sparse taxonomy leaf and limited literature search scope, the work appears to occupy relatively novel ground in applying LLMs to political consensus drafting. However, the presence of refutable candidates for two of three contributions indicates some foundational elements have precedent. The analysis reflects top-thirty semantic matches and does not claim comprehensive field coverage, leaving open the possibility of additional relevant work outside this search boundary.

Taxonomy

Core-task Taxonomy Papers
11
3
Claimed Contributions
30
Contribution Candidate Papers Compared
2
Refutable Paper

Research Landscape Overview

Core task: Drafting political consensus resolutions from divergent party positions. The field encompasses several complementary branches. Computational and AI-Based Consensus Generation explores algorithmic and machine learning approaches to synthesizing compromise text from opposing viewpoints, with recent work such as PoliCon[0] and EuroCon[2] leveraging large language models to automate resolution drafting. Empirical Analysis of Political Deliberation and Consensus Processes examines real-world negotiation dynamics, institutional mechanisms, and historical case studies like Federal Convention Consensus[8] and Conflict to Consensus[9] to understand how consensus emerges in practice. Party Position Measurement and Ideological Analysis focuses on quantifying and mapping ideological stances through expert surveys and scaling methods, exemplified by Chapel Hill Expert Surveys[4] and Unified Party Competition[3], providing the foundational data that computational methods often require. Campaign Discourse and Rhetorical Strategy Analysis investigates persuasive communication tactics and framing strategies, such as those studied in Debate Rhetorical Strategies[7], which inform how parties articulate and potentially reconcile their positions. A particularly active line of work centers on LLM-based consensus drafting, where systems must balance fidelity to input positions with the generation of acceptable compromise language. PoliCon[0] sits squarely within this emerging computational branch, employing transformer-based models to produce resolution text that bridges partisan divides. Its emphasis on automated synthesis contrasts with the more data-driven orientation of works like Chapel Hill Expert Surveys[4], which catalog party positions without attempting generative reconciliation, and differs from empirical studies such as Federal Convention Consensus[8] that analyze human-led deliberation without computational intervention. A key open question across these branches is how to validate consensus quality: whether through expert evaluation, alignment with historical outcomes, or stakeholder acceptance, and how computational approaches like PoliCon[0] can incorporate the nuanced understanding of ideological space and rhetorical strategy that traditional political science provides.

Claimed Contributions

PoliCon benchmark for evaluating LLMs on political consensus

The authors construct PoliCon, a benchmark built from 2,225 European Parliament deliberation records spanning 2009-2022. It evaluates LLMs' ability to generate consensus resolutions under diverse collective decision-making contexts by incorporating four adjustable factors: political issues, political goals, participating parties, and power structures based on seat distribution.

10 retrieved papers
Can Refute
Evaluation framework based on social choice theory

The authors develop an open-ended evaluation framework grounded in social choice theory that first simulates voting outcomes for each political party using an LLM-as-a-judge approach, then maps these votes to quantitative scores based on specific political consensus objectives such as simple majority, two-thirds majority, veto power, Rawlsianism, and Utilitarianism.

10 retrieved papers
Large-scale data collection and cleaning procedure

The authors perform comprehensive data collection from multiple sources (European Parliament website, HowTheyVote, VoteWatch Europe dataset) and apply extensive cleaning procedures using DeepSeek-R1 and rule-based methods. This produces integrated sextuples of issue, topic, background, stances, resolution, and votes for each parliamentary record.

10 retrieved papers
Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

PoliCon benchmark for evaluating LLMs on political consensus

The authors construct PoliCon, a benchmark built from 2,225 European Parliament deliberation records spanning 2009-2022. It evaluates LLMs' ability to generate consensus resolutions under diverse collective decision-making contexts by incorporating four adjustable factors: political issues, political goals, participating parties, and power structures based on seat distribution.

Contribution

Evaluation framework based on social choice theory

The authors develop an open-ended evaluation framework grounded in social choice theory that first simulates voting outcomes for each political party using an LLM-as-a-judge approach, then maps these votes to quantitative scores based on specific political consensus objectives such as simple majority, two-thirds majority, veto power, Rawlsianism, and Utilitarianism.

Contribution

Large-scale data collection and cleaning procedure

The authors perform comprehensive data collection from multiple sources (European Parliament website, HowTheyVote, VoteWatch Europe dataset) and apply extensive cleaning procedures using DeepSeek-R1 and rule-based methods. This produces integrated sextuples of issue, topic, background, stances, resolution, and votes for each parliamentary record.