Adaptive Social Learning via Mode Policy Optimization for Language Agents

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 7.0 Download Report PDF

Social IntelligeneLarge Language ModelsAdaptive Social Learning

Effective social intelligence simulation requires language agents to dynamically adjust reasoning depth, a capability notably absent in current studies. Existing methods either lack explicit reasoning or employ lengthy Chain-of-Thought reasoning uniformly across all scenarios, resulting in excessive token usage and inflexible social behaviors in tasks such as negotiation or collaboration. To address this, we propose an $\textbf{A}$ daptive $\textbf{S}$ ocial $\textbf{L}$ earning ( $\textbf{ASL}$ ) framework in this paper, aiming to improve the adaptive reasoning ability of language agents in dynamic social interactions. To this end, we first identify the hierarchical reasoning modes under such context, ranging from intuitive response to deep deliberation based on the cognitive control theory. We then develop the $\textbf{A}$ daptive $\textbf{M}$ ode $\textbf{P}$ olicy $\textbf{O}$ ptimization ( $\textbf{AMPO}$ ) algorithm to learn the context-aware mode adaptation and reasoning. Our framework advances existing research in three key aspects: (1) Multi-granular reasoning mode design, (2) Context-aware mode switching in rich social interaction, and (3) Token-efficient reasoning with depth adaptation. Extensive experiments on the benchmark social intelligence environment verify that ASL achieves 15.6% higher task performance than GPT-4o. Notably, our AMPO outperforms GRPO by 7.0% with 32.8% shorter thinking chains, demonstrating the advantages of our AMPO and the learned adaptive reasoning ability over GRPO's solution.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes an Adaptive Social Learning (ASL) framework enabling language agents to dynamically adjust reasoning depth in social interactions, from intuitive responses to deep deliberation. It resides in the Language Agent Adaptive Reasoning Systems leaf, which contains only two papers total. This sparse population suggests the specific combination of language-model-based agents with adaptive reasoning depth in social tasks represents an emerging rather than crowded research direction within the broader computational agent frameworks branch.

The taxonomy reveals that neighboring leaves address trajectory prediction, embodied agents, and computational models of social norms, but none explicitly tackle adaptive reasoning depth in language-based social agents. The broader Computational Agent Frameworks branch contrasts sharply with the Human Cognitive and Social Processes branch, which contains nine papers on cognitive flexibility in educational contexts alone. This structural asymmetry indicates that while human adaptive reasoning is well-studied, computational implementations for language agents remain relatively underexplored, particularly those integrating hierarchical reasoning modes with context-aware switching.

Among twenty-six candidates examined, the ASL framework contribution shows one refutable candidate from ten examined, while the AMPO algorithm and hierarchical reasoning modes show zero refutations from six and ten candidates respectively. The limited search scope means these statistics reflect top-K semantic matches rather than exhaustive coverage. The AMPO algorithm and reasoning mode design appear more novel within this bounded search, whereas the broader ASL framework concept encounters at least one overlapping prior work among the examined candidates.

Based on the top-twenty-six semantic matches and taxonomy structure, the work addresses a sparsely populated research direction with limited direct prior work. The analysis does not cover the full literature landscape, particularly domain-specific applications or recent preprints outside the search scope. The hierarchical reasoning modes and token-efficient adaptation appear to offer substantive contributions, though the framework-level novelty is tempered by at least one identified overlap.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: adaptive reasoning in dynamic social interactions. This field examines how agents—whether computational, human, or animal—adjust their cognitive strategies and behaviors in response to evolving social contexts. The taxonomy reveals six major branches that together capture the breadth of this challenge. Computational Agent Frameworks for Social Interaction focus on building artificial systems capable of flexible reasoning and learning in multi-agent environments, often leveraging language models and reinforcement learning architectures such as Adaptive Mobile Agent[3] and Adaptive Thinking Mode[1]. Human Cognitive and Social Processes explore psychological mechanisms underlying flexibility, including cognitive flexibility training interventions (Cognitive Flexibility Training[11], Cognitive Flexibility Support[10]) and the interplay between emotion regulation, mental flexibility, and social competence (Emotion Regulation Mediation[2], Cognitive Flexibility Anxiety[24]). Neuroscience and Biological Mechanisms investigate neural substrates and developmental factors, such as hippocampal contributions to social learning (Hippocampus Social Learning[14]) and the impact of early stress on adaptive capacities (Early Stress Impairment[18]). Theoretical and Methodological Frameworks provide formal models, including Bayesian approaches to joint action (Bayesian Joint Action[43]) and complex adaptive systems theory (Complex Adaptive Systems[36]). Applied and Domain-Specific Interaction Studies address real-world settings like therapeutic responsiveness (Therapist Interpersonal Responsiveness[27]) and driver interactions (Driver Social Interactions[23]), while Animal and Comparative Studies examine adaptive foraging and environmental enrichment effects (Adaptive Social Foraging[41], Environmental Enrichment Flexibility[45]). Several active lines of work highlight key trade-offs and open questions. One prominent theme contrasts top-down cognitive training interventions aimed at enhancing flexibility with bottom-up investigations of how environmental and affective factors shape adaptive capacities, raising questions about the relative malleability of these processes across development and contexts. Another tension emerges between formal computational models that seek to capture reasoning dynamics in tractable frameworks and empirical studies documenting the messy, context-dependent nature of real social interactions. Adaptive Social Learning[0] sits squarely within the Computational Agent Frameworks branch, specifically among Language Agent Adaptive Reasoning Systems. Its emphasis on learning-driven adaptation in social contexts aligns closely with Adaptive Thinking Mode[1], which similarly explores how agents modulate reasoning strategies. Compared to more domain-specific applied work or neuroscience-focused studies, Adaptive Social Learning[0] prioritizes the design of general-purpose computational architectures that can flexibly adjust to diverse social scenarios, positioning it as a bridge between theoretical models of adaptive reasoning and practical agent deployment.

Claimed Contributions

Adaptive Social Learning (ASL) framework for language agents

Can Refute

10 retrieved papers

The authors introduce ASL, a novel framework that enables language agents to dynamically adjust their reasoning depth in social interactions. It combines hierarchical reasoning modes inspired by cognitive control theory with reinforcement learning to achieve context-aware adaptive reasoning in dynamic social environments.

10 retrieved papers

Can Refute

Adaptive Mode Policy Optimization (AMPO) algorithm

6 retrieved papers

The authors develop AMPO, a reinforcement learning algorithm that incorporates both mode-level and sample-level information into advantage estimation. This enables context-aware reasoning mode switching while improving token efficiency and flexible inference in social interactions.

6 retrieved papers

Hierarchical reasoning modes for social intelligence

10 retrieved papers

The authors design a hierarchy of reasoning modes based on cognitive control theory, ranging from intuitive responses to deep deliberation. These modes enable multi-granular reasoning and context-aware mode switching in social interactions, addressing the limitation of uniform reasoning approaches.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[1] Adaptive Thinking via Mode Policy Optimization for Social Language Agents PDF

Wang Min-zheng, Li, Yongbin, Minzheng Wang, Wang Haobo, Yongbin Li, Zhang, Xinghua, Haobo Wang, Xu Nan, Xinghua Zhang, Wu Bingli, Nan Xu, Huang Fei, Bingli Wu, Yu, Haiyang, Fei Huang, Mao Wen-ji, Haiyang Yu, Wenji Mao (2025)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Adaptive Social Learning (ASL) framework for language agents

[1] Adaptive Thinking via Mode Policy Optimization for Social Language Agents PDF

Can Refute

[67] Inadequacies of large language model benchmarks in the era of generative artificial intelligence PDF

Cannot Refute

[68] Agentic large language models, a survey PDF

Cannot Refute

[69] K-Level Reasoning: Establishing Higher Order Beliefs in Large Language Models for Strategic Reasoning PDF

Cannot Refute

[70] Darg: Dynamic evaluation of large language models via adaptive reasoning graph PDF

Cannot Refute

[71] Social-llava: Enhancing robot navigation through human-language reasoning in social spaces PDF

Cannot Refute

[72] SCOOP: A Framework for Proactive Collaboration and Social Continual Learning through Natural Language Interaction andCausal Reasoning PDF

Cannot Refute

[73] A Reflective Architecture for LLM-Based Systems PDF

Cannot Refute

[74] AdamMeme: Adaptively Probe the Reasoning Capacity of Multimodal Large Language Models on Harmfulness PDF

Cannot Refute

[75] Put Your Money Where Your Mouth Is: Evaluating Strategic Planning and Execution of LLM Agents in an Auction Arena PDF

Cannot Refute

Contribution

Adaptive Mode Policy Optimization (AMPO) algorithm

[61] A dual reinforcement learning framework for unsupervised text style transfer PDF

Cannot Refute

[62] Effective Reinforcement Learning for Reasoning in Language Models PDF

Cannot Refute

[63] Soft policy optimization using dual-track advantage estimator PDF

Cannot Refute

[64] SOAP-RL: Sequential Option Advantage Propagation for Reinforcement Learning in POMDP Environments PDF

Cannot Refute

[65] REPAINT: Knowledge Transfer in Deep Actor-Critic Reinforcement Learning

Cannot Refute

[66] Value-Anchored Group Policy Optimization for Flow Models PDF

Cannot Refute

Contribution

Hierarchical reasoning modes for social intelligence

[51] Simultaneous learning and planning in a hierarchical control system for a cognitive agent PDF

Cannot Refute

[52] Multi-level compositional reasoning for interactive instruction following PDF

Cannot Refute

[53] Navigating the affordance landscape: feedback control as a process model of behavior and cognition PDF

Cannot Refute

[54] Hierarchical Reasoning Model PDF

Cannot Refute

[55] Generalized dynamic cognitive hierarchy models for strategic driving behavior PDF

Cannot Refute

[56] A Dynamic Selective Parameter Sharing Mechanism Embedded with Multi-Level Reasoning Abstractions PDF

Cannot Refute

[57] Cognition is All You Need--The Next Layer of AI Above Large Language Models PDF

Cannot Refute

[58] Multi-level simulation of the physical, cognitive and social PDF

Cannot Refute

[59] Multi-Level Online Learning and Reasoning for Self-Integrating Systems PDF

Cannot Refute

[60] Conceptual framework for autonomous cognitive entities PDF

Cannot Refute

Adaptive Social Learning via Mode Policy Optimization for Language Agents

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[1] Adaptive Thinking via Mode Policy Optimization for Social Language Agents PDF

Contribution Analysis

Adaptive Social Learning (ASL) framework for language agents

[1] Adaptive Thinking via Mode Policy Optimization for Social Language Agents PDF

[67] Inadequacies of large language model benchmarks in the era of generative artificial intelligence PDF

[68] Agentic large language models, a survey PDF

[69] K-Level Reasoning: Establishing Higher Order Beliefs in Large Language Models for Strategic Reasoning PDF

[70] Darg: Dynamic evaluation of large language models via adaptive reasoning graph PDF

[71] Social-llava: Enhancing robot navigation through human-language reasoning in social spaces PDF

[72] SCOOP: A Framework for Proactive Collaboration and Social Continual Learning through Natural Language Interaction andCausal Reasoning PDF

[73] A Reflective Architecture for LLM-Based Systems PDF

[74] AdamMeme: Adaptively Probe the Reasoning Capacity of Multimodal Large Language Models on Harmfulness PDF

[75] Put Your Money Where Your Mouth Is: Evaluating Strategic Planning and Execution of LLM Agents in an Auction Arena PDF

Adaptive Mode Policy Optimization (AMPO) algorithm

[61] A dual reinforcement learning framework for unsupervised text style transfer PDF

[62] Effective Reinforcement Learning for Reasoning in Language Models PDF

[63] Soft policy optimization using dual-track advantage estimator PDF

[64] SOAP-RL: Sequential Option Advantage Propagation for Reinforcement Learning in POMDP Environments PDF

[65] REPAINT: Knowledge Transfer in Deep Actor-Critic Reinforcement Learning

[66] Value-Anchored Group Policy Optimization for Flow Models PDF

Hierarchical reasoning modes for social intelligence

[51] Simultaneous learning and planning in a hierarchical control system for a cognitive agent PDF

[52] Multi-level compositional reasoning for interactive instruction following PDF

[53] Navigating the affordance landscape: feedback control as a process model of behavior and cognition PDF

[54] Hierarchical Reasoning Model PDF

[55] Generalized dynamic cognitive hierarchy models for strategic driving behavior PDF

[56] A Dynamic Selective Parameter Sharing Mechanism Embedded with Multi-Level Reasoning Abstractions PDF

[57] Cognition is All You Need--The Next Layer of AI Above Large Language Models PDF

[58] Multi-level simulation of the physical, cognitive and social PDF

[59] Multi-Level Online Learning and Reasoning for Self-Integrating Systems PDF

[60] Conceptual framework for autonomous cognitive entities PDF

Table of Contents