PoliCon: Evaluating LLMs on Achieving Diverse Political Consensus Objectives

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

Political Consensus FindingCollective Decision MakingBenchmark on Societal Considerations

Achieving political consensus is crucial yet challenging for the effective functioning of social governance. However, although frontier AI systems represented by large language models (LLMs) have developed rapidly in recent years, their capabilities in this scope are still understudied. In this paper, we introduce PoliCon, a novel benchmark constructed from 2,225 high-quality deliberation records of the European Parliament over 13 years, ranging from 2009 to 2022, to evaluate the ability of LLMs to draft consensus resolutions based on divergent party positions under varying collective decision-making contexts and political requirements. Specifically, PoliCon incorporates four factors to build each task environment for finding different political consensus: specific political issues, political goals, participating parties, and power structures based on seat distribution. We also developed an evaluation framework based on social choice theory for PoliCon, which simulates the real voting outcomes of different political parties to assess whether LLM-generated resolutions meet the requirements of the predetermined political consensus. Our experimental results demonstrate that even state-of-the-art models remain undersatisfied with complex tasks like passing resolutions by a two-thirds majority and addressing security issues, while uncovering their inherent partisan biases and revealing some behaviors LLMs show to achieve the consensus, such as prioritizing the stance of the dominant party instead of uniting smaller parties, which highlights PoliCon's promise as an effective platform for studying LLMs' ability to promote political consensus.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces PoliCon, a benchmark for evaluating large language models on drafting political consensus resolutions from divergent party positions using European Parliament deliberation records. It resides in the 'LLM-Based Consensus Resolution Drafting' leaf, which contains only two papers total including this work. This represents a sparse, emerging research direction within the broader computational consensus generation branch, suggesting the application of LLMs to political resolution drafting is still in early stages of exploration.

The taxonomy reveals neighboring work in computational belief estimation and polarization analysis, which focuses on detecting agreement points from social media rather than generating resolution text. The broader field includes empirical deliberation analysis examining real-world consensus mechanisms and party position measurement studies that quantify ideological stances. PoliCon bridges computational generation with empirical political science by grounding its tasks in actual parliamentary data, distinguishing it from purely theoretical frameworks or human-centered deliberation studies excluded from its immediate branch.

Among thirty candidates examined, the PoliCon benchmark contribution shows one refutable candidate from ten examined, while the social choice theory evaluation framework found no clear refutations across ten candidates. The data collection procedure also identified one overlapping prior work among ten examined. These statistics suggest moderate prior work exists for the benchmark and data aspects, while the evaluation framework appears more distinctive within this limited search scope. The relatively small candidate pool indicates this assessment captures top semantic matches rather than exhaustive coverage.

Given the sparse taxonomy leaf and limited literature search scope, the work appears to occupy relatively novel ground in applying LLMs to political consensus drafting. However, the presence of refutable candidates for two of three contributions indicates some foundational elements have precedent. The analysis reflects top-thirty semantic matches and does not claim comprehensive field coverage, leaving open the possibility of additional relevant work outside this search boundary.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Drafting political consensus resolutions from divergent party positions. The field encompasses several complementary branches. Computational and AI-Based Consensus Generation explores algorithmic and machine learning approaches to synthesizing compromise text from opposing viewpoints, with recent work such as PoliCon[0] and EuroCon[2] leveraging large language models to automate resolution drafting. Empirical Analysis of Political Deliberation and Consensus Processes examines real-world negotiation dynamics, institutional mechanisms, and historical case studies like Federal Convention Consensus[8] and Conflict to Consensus[9] to understand how consensus emerges in practice. Party Position Measurement and Ideological Analysis focuses on quantifying and mapping ideological stances through expert surveys and scaling methods, exemplified by Chapel Hill Expert Surveys[4] and Unified Party Competition[3], providing the foundational data that computational methods often require. Campaign Discourse and Rhetorical Strategy Analysis investigates persuasive communication tactics and framing strategies, such as those studied in Debate Rhetorical Strategies[7], which inform how parties articulate and potentially reconcile their positions. A particularly active line of work centers on LLM-based consensus drafting, where systems must balance fidelity to input positions with the generation of acceptable compromise language. PoliCon[0] sits squarely within this emerging computational branch, employing transformer-based models to produce resolution text that bridges partisan divides. Its emphasis on automated synthesis contrasts with the more data-driven orientation of works like Chapel Hill Expert Surveys[4], which catalog party positions without attempting generative reconciliation, and differs from empirical studies such as Federal Convention Consensus[8] that analyze human-led deliberation without computational intervention. A key open question across these branches is how to validate consensus quality: whether through expert evaluation, alignment with historical outcomes, or stakeholder acceptance, and how computational approaches like PoliCon[0] can incorporate the nuanced understanding of ideological space and rhetorical strategy that traditional political science provides.

Claimed Contributions

PoliCon benchmark for evaluating LLMs on political consensus

Can Refute

10 retrieved papers

The authors construct PoliCon, a benchmark built from 2,225 European Parliament deliberation records spanning 2009-2022. It evaluates LLMs' ability to generate consensus resolutions under diverse collective decision-making contexts by incorporating four adjustable factors: political issues, political goals, participating parties, and power structures based on seat distribution.

10 retrieved papers

Can Refute

Evaluation framework based on social choice theory

10 retrieved papers

The authors develop an open-ended evaluation framework grounded in social choice theory that first simulates voting outcomes for each political party using an LLM-as-a-judge approach, then maps these votes to quantitative scores based on specific political consensus objectives such as simple majority, two-thirds majority, veto power, Rawlsianism, and Utilitarianism.

10 retrieved papers

Large-scale data collection and cleaning procedure

Can Refute

10 retrieved papers

The authors perform comprehensive data collection from multiple sources (European Parliament website, HowTheyVote, VoteWatch Europe dataset) and apply extensive cleaning procedures using DeepSeek-R1 and rule-based methods. This produces integrated sextuples of issue, topic, background, stances, resolution, and votes for each parliamentary record.

10 retrieved papers

Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[2] EuroCon: Benchmarking Parliament Deliberation for Political Consensus Finding PDF

Z Zhang, M Yi, M Wang, F Bai, Z Zheng, Y Kang (2025)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

PoliCon benchmark for evaluating LLMs on political consensus

[2] EuroCon: Benchmarking Parliament Deliberation for Political Consensus Finding PDF

Can Refute

[21] How large language models can reshape collective intelligence PDF

Cannot Refute

[22] How does AI compare to the experts in a Delphi setting: simulating medical consensus with large language models. PDF

Cannot Refute

[23] Temperature and persona shape llm agent consensus with minimal accuracy gains in qualitative coding PDF

Cannot Refute

[24] Towards Deliberating Agents: Evaluating the Ability of Large Language Models to Deliberate PDF

Cannot Refute

[25] Towards Insider Summarization for Mediation Instead of Moderation: Examining Wikipedian Views on Key Elements of Discussion Summaries PDF

Cannot Refute

[26] Piloting ModiBot: A Large Language Model-Based Moderator in Normal and Emotionally Challenging Focus Group Interactions PDF

Cannot Refute

[27] CivicParse: A Benchmark and Pipeline for Structured Online Deliberation PDF

Cannot Refute

[28] 9 Artificial intelligence in deliberative democracy PDF

Cannot Refute

[29] A Climatology-specific Large Language Model using Ensemble Projection Data toward Regional Climate Services PDF

Cannot Refute

Contribution

Evaluation framework based on social choice theory

[30] The dynamics of strategic voting: pathways to consensus and gridlock PDF

Cannot Refute

[31] Multi-agent Opinion Pooling by Voting for Bins: Simulations and Characterization PDF

Cannot Refute

[32] Social influence and consensus building: Introducing a q-voter model with weighted influence PDF

Cannot Refute

[33] An Uncertainty- and Collusion-Proof Voting Consensus Mechanism in Blockchain PDF

Cannot Refute

[34] Society Protect: Developing A Voting Mechanism for Blockchain-based AI Alignment PDF

Cannot Refute

[35] Toward decentralization in DPoS systems: Election, voting, and leader selection using virtual stake PDF

Cannot Refute

[36] Weighted voting on the blockchain: Improving consensus in proof of stake protocols PDF

Cannot Refute

[37] Multi-agent Opinion Pooling by Voting PDF

Cannot Refute

[38] Next-Generation Relay Voting Scheme Design Leveraging Consensus Algorithms PDF

Cannot Refute

[39] Deliberative democracy and social choice PDF

Cannot Refute

Contribution

Large-scale data collection and cleaning procedure

[2] EuroCon: Benchmarking Parliament Deliberation for Political Consensus Finding PDF

Can Refute

[12] Measuring emotion in parliamentary debates with automated textual analysis PDF

Cannot Refute

[13] A Discourse Analysis Framework for Legislative and Social Media Debates PDF

Cannot Refute

[14] Using Corpora in Discourse Analysis PDF

Cannot Refute

[15] Publishing and using parliamentary Linked Data on the Semantic Web: ParliamentSampo system for Parliament of Finland PDF

Cannot Refute

[16] The Swedish parliament corpus 1867â2022 PDF

Cannot Refute

[17] The ParlaMint corpora of parliamentary proceedings PDF

Cannot Refute

[18] Processing LargeâScale Archival Records: The Case of the Swiss Parliamentary Records PDF

Cannot Refute

[19] The Knesset corpus: an annotated corpus of Hebrew parliamentary proceedings: G. Goldin et al. PDF

Cannot Refute

[20] Adding political orientation metadata to ParlaMint corpora PDF

Cannot Refute

PoliCon: Evaluating LLMs on Achieving Diverse Political Consensus Objectives

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[2] EuroCon: Benchmarking Parliament Deliberation for Political Consensus Finding PDF

Contribution Analysis

PoliCon benchmark for evaluating LLMs on political consensus

[2] EuroCon: Benchmarking Parliament Deliberation for Political Consensus Finding PDF

[21] How large language models can reshape collective intelligence PDF

[22] How does AI compare to the experts in a Delphi setting: simulating medical consensus with large language models. PDF

[23] Temperature and persona shape llm agent consensus with minimal accuracy gains in qualitative coding PDF

[24] Towards Deliberating Agents: Evaluating the Ability of Large Language Models to Deliberate PDF

[25] Towards Insider Summarization for Mediation Instead of Moderation: Examining Wikipedian Views on Key Elements of Discussion Summaries PDF

[26] Piloting ModiBot: A Large Language Model-Based Moderator in Normal and Emotionally Challenging Focus Group Interactions PDF

[27] CivicParse: A Benchmark and Pipeline for Structured Online Deliberation PDF

[28] 9 Artificial intelligence in deliberative democracy PDF

[29] A Climatology-specific Large Language Model using Ensemble Projection Data toward Regional Climate Services PDF

Evaluation framework based on social choice theory

[30] The dynamics of strategic voting: pathways to consensus and gridlock PDF

[31] Multi-agent Opinion Pooling by Voting for Bins: Simulations and Characterization PDF

[32] Social influence and consensus building: Introducing a q-voter model with weighted influence PDF

[33] An Uncertainty- and Collusion-Proof Voting Consensus Mechanism in Blockchain PDF

[34] Society Protect: Developing A Voting Mechanism for Blockchain-based AI Alignment PDF

[35] Toward decentralization in DPoS systems: Election, voting, and leader selection using virtual stake PDF

[36] Weighted voting on the blockchain: Improving consensus in proof of stake protocols PDF

[37] Multi-agent Opinion Pooling by Voting PDF

[38] Next-Generation Relay Voting Scheme Design Leveraging Consensus Algorithms PDF

[39] Deliberative democracy and social choice PDF

Large-scale data collection and cleaning procedure

[2] EuroCon: Benchmarking Parliament Deliberation for Political Consensus Finding PDF

[12] Measuring emotion in parliamentary debates with automated textual analysis PDF

[13] A Discourse Analysis Framework for Legislative and Social Media Debates PDF

[14] Using Corpora in Discourse Analysis PDF

[15] Publishing and using parliamentary Linked Data on the Semantic Web: ParliamentSampo system for Parliament of Finland PDF

[16] The Swedish parliament corpus 1867â2022 PDF

[17] The ParlaMint corpora of parliamentary proceedings PDF

[18] Processing LargeâScale Archival Records: The Case of the Swiss Parliamentary Records PDF

[19] The Knesset corpus: an annotated corpus of Hebrew parliamentary proceedings: G. Goldin et al. PDF

[20] Adding political orientation metadata to ParlaMint corpora PDF

Table of Contents

[16] The Swedish parliament corpus 1867â2022 PDF

[18] Processing LargeâScale Archival Records: The Case of the Swiss Parliamentary Records PDF