Towards Strategic Persuasion with Language Models

ICLR 2026 Conference SubmissionAnonymous Authors
Large Language ModelsStrategic BehaviorInformation Design
Abstract:

Large language models (LLMs) have demonstrated strong persuasive capabilities comparable to those of humans, offering promising benefits while raising societal concerns about their deployment. However, systematically evaluating persuasive capabilities is inherently challenging, as the effectiveness of persuasion among humans varies significantly across different domains. In this paper, we take a theory-driven approach to provide a scalable and principled framework to measure the persuasive capabilities of LLMs in strategic interactions. Grounded in the Bayesian Persuasion (BP) framework, we repurpose existing human–human persuasion datasets to construct environments for evaluating and training LLMs in strategic persuasion. Our results reveal that frontier models can consistently achieve high persuasion gains and exhibit sophisticated persuasion strategies that align with theoretical predictions. Building on this, we use reinforcement learning to train LLMs for strategic persuasion in our environments. Our results also demonstrate that even small LLMs can obtain significantly higher persuasion gains through reinforcement learning.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper contributes a theory-driven framework grounded in Bayesian Persuasion for measuring LLM persuasive capabilities, environments for evaluating strategic persuasion, and a reinforcement learning approach for training persuaders. It resides in the 'Reinforcement Learning for Strategic Persuasion' leaf under 'Training Approaches for Persuasive Language Models', which contains only two papers total. This represents a relatively sparse research direction within the broader taxonomy of fifty papers, suggesting the specific combination of Bayesian Persuasion theory with RL-based training for strategic influence is not yet heavily explored.

The taxonomy reveals neighboring work in adjacent leaves: 'Persuasive Dataset Construction and Generation' focuses on creating training data through multi-LLM communication, while 'Supervised Training and Fine-Tuning Methods' employs prompt engineering and supervised learning. The sibling paper in the same leaf addresses planning without search in strategic settings, emphasizing efficient decision-making rather than influence optimization. Nearby branches include 'Game-Based Persuasion Environments' and 'Strategic Reasoning Integration in Complex Games', which examine persuasion through game mechanics but typically without the Bayesian theoretical grounding or RL training focus presented here.

Among twenty-six candidates examined, the contribution-level analysis shows varied novelty signals. The theory-driven measurement framework examined seven candidates with one appearing to provide overlapping prior work, suggesting some precedent exists for principled persuasion evaluation. The environment construction contribution examined nine candidates with none clearly refuting it, indicating relative novelty in repurposing human-human datasets for LLM strategic persuasion training. The reinforcement learning training approach examined ten candidates with no clear refutations, suggesting this specific application of RL to Bayesian Persuasion environments may be less explored in the limited search scope.

Based on the limited search of top-K semantic matches and citation expansion, the work appears to occupy a moderately novel position, particularly in combining Bayesian theoretical foundations with RL-based training. The sparse population of its taxonomy leaf and the absence of clear refutations for two of three contributions support this impression, though the analysis does not cover exhaustive literature review or domain-specific persuasion research outside the examined candidates.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
26
Contribution Candidate Papers Compared
1
Refutable Paper

Research Landscape Overview

Core task: Evaluating and training language models for strategic persuasion. The field encompasses diverse approaches to understanding and enhancing how language models influence human beliefs and decisions. At the highest level, the taxonomy organizes work into six main branches: measurement and benchmarking efforts that quantify persuasive capabilities (e.g., Benchmarking Persuasive Language[6], Measuring Persuasiveness[19]); multi-agent frameworks exploring strategic interaction in competitive or cooperative settings (e.g., Persuasion Games[15], Diplomacy Language Models[1]); conversational persuasion studies examining interactive communication dynamics (e.g., Conversational Persuasiveness Trial[4]); training methodologies that optimize models for persuasive outcomes; linguistic and mechanistic analyses of how persuasion operates (e.g., Persuasion Mechanisms and Linguistic Analysis); and safety-oriented research addressing ethical risks (e.g., Dangerous Persuader[17], Democratic Persuasion Risks[29]). These branches reflect complementary perspectives—some focus on capability development, others on understanding or mitigation—creating a landscape where technical advancement coexists with normative concerns. Within this landscape, several active lines of work reveal key tensions and open questions. Training approaches range from reinforcement learning methods that optimize for persuasive success to frameworks balancing effectiveness with truthfulness (e.g., Persuasive Truthful Answers[23]). Strategic Persuasion[0] sits squarely in the reinforcement learning branch for training persuasive models, sharing methodological ground with Planning Without Search[50], which also explores strategic decision-making without exhaustive search. While Planning Without Search[50] emphasizes efficient planning mechanisms, Strategic Persuasion[0] focuses specifically on optimizing persuasive strategies through RL, distinguishing itself by targeting influence dynamics rather than general strategic reasoning. Across the field, researchers grapple with trade-offs between model capability and safety, the generalizability of persuasion across domains (e.g., Diverse Domains Dataset[30]), and whether persuasive power should be measured, enhanced, or constrained—questions that remain central as language models grow more sophisticated.

Claimed Contributions

Theory-driven framework for measuring LLM persuasive capabilities

The authors propose a principled framework grounded in Bayesian persuasion theory to systematically measure and evaluate the persuasive capabilities of large language models. This framework provides scalable measurements using persuasion gains and signals as instruments, addressing challenges in evaluating persuasion across heterogeneous domains.

7 retrieved papers
Can Refute
Environments for evaluating and training LLMs in strategic persuasion

The authors construct scalable environments by repurposing existing human persuasion datasets, enabling both evaluation and training of LLMs in strategic persuasion settings. These environments implement both Sender and Receiver roles using LLMs within the Bayesian persuasion framework.

9 retrieved papers
Reinforcement learning approach for training strategic persuaders

The authors develop a reinforcement learning framework to train LLMs as strategic persuaders, demonstrating that even small LLMs can achieve significantly higher persuasion gains through RL training. The approach maximizes persuasion rewards defined by utility gains over prior beliefs.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Theory-driven framework for measuring LLM persuasive capabilities

The authors propose a principled framework grounded in Bayesian persuasion theory to systematically measure and evaluate the persuasive capabilities of large language models. This framework provides scalable measurements using persuasion gains and signals as instruments, addressing challenges in evaluating persuasion across heterogeneous domains.

Contribution

Environments for evaluating and training LLMs in strategic persuasion

The authors construct scalable environments by repurposing existing human persuasion datasets, enabling both evaluation and training of LLMs in strategic persuasion settings. These environments implement both Sender and Receiver roles using LLMs within the Bayesian persuasion framework.

Contribution

Reinforcement learning approach for training strategic persuaders

The authors develop a reinforcement learning framework to train LLMs as strategic persuaders, demonstrating that even small LLMs can achieve significantly higher persuasion gains through RL training. The approach maximizes persuasion rewards defined by utility gains over prior beliefs.

Towards Strategic Persuasion with Language Models | Novelty Validation