Context Learning for Multi-Agent Discussion

ICLR 2026 Conference SubmissionAnonymous Authors
Large Language ModelsContext LearningMulti-agent discussion
Abstract:

Multi-Agent Discussion (MAD) has garnered increasing attention very recently, where multiple LLM instances collaboratively solve problems via structured discussion. However, we find that current MAD methods easily suffer from discussion inconsistency—LLMs fail to reach a coherent solution—due to the misalignment between their individual contexts. In this paper, we introduce a multi-LLM context learning method (M2CL) that learns a context generator for each agent, capable of dynamically generating context instructions per discussion round via automatic information organization and refinement. Specifically, inspired by our theoretical insights on the context instruction, M2CL train the generators to control context coherence and output discrepancies via a carefully crafted self-adaptive mechanism. It enables LLMs to avoid premature convergence on “majority noise” and progressively reach the correct consensus. We evaluate M2CL on challenging tasks, including academic reasoning, embodied tasks, and mobile control. The results show that the performance of M2CL significantly surpasses existing methods by 20%--50%, while enjoying favorable transferability and computational efficiency.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes M2CL, a multi-LLM context learning method that trains context generators to dynamically produce instructions per discussion round, addressing discussion inconsistency in multi-agent systems. It resides in the 'Context Learning and Communication Optimization' leaf, which contains only three papers total, including this one. This represents a relatively sparse research direction within the broader taxonomy of 50 papers across 36 topics, suggesting the specific focus on learned context generation for multi-agent discussion is not yet heavily explored.

The taxonomy reveals that M2CL's leaf sits within 'Collaboration Mechanisms and Communication Protocols', adjacent to leaves focused on debate mechanisms, aggregation methods, and general collaboration strategies. Neighboring work includes structured debate approaches and voting-based consensus methods, which typically rely on fixed protocols rather than learned context adaptation. The taxonomy's scope notes clarify that this leaf specifically covers learning-based communication optimization, distinguishing it from static protocols or debate-without-learning approaches found in sibling branches.

Among 27 candidates examined across three contributions, no clearly refuting prior work was identified. The core M2CL method examined 10 candidates with zero refutations, the lightweight initialization approach examined 7 with zero refutations, and the self-adaptive balancing mechanism examined 10 with zero refutations. This limited search scope—focused on top-K semantic matches and citation expansion—suggests that within the examined literature, the specific combination of learned context generation, self-adaptive balancing, and multi-round refinement appears relatively unexplored, though the analysis does not claim exhaustive coverage.

Based on the limited search of 27 candidates, the work appears to occupy a distinct position within context learning for multi-agent systems. The sparse population of its taxonomy leaf and absence of refuting candidates among those examined suggest potential novelty, though the analysis acknowledges it cannot rule out relevant work outside the top-K semantic neighborhood or in adjacent research communities not captured by this search strategy.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
27
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: Multi-agent discussion with large language models. The field has evolved into a structured landscape with several major branches. Multi-Agent System Architectures and Frameworks establish the foundational designs—ranging from modular platforms like Autogen[4] to specialized coordination schemes—while Collaboration Mechanisms and Communication Protocols explore how agents exchange information, negotiate roles, and refine shared context. Training and Optimization Methods address learning strategies that improve agent policies and coordination over time, often blending reinforcement learning with LLM fine-tuning. Evaluation and Benchmarking provide standardized testbeds and metrics to compare different multi-agent setups, and Domain-Specific Applications demonstrate how these systems tackle real-world problems in areas such as autonomous driving, medical diagnosis, and strategic games. Finally, Surveys and Theoretical Foundations synthesize emerging principles and offer conceptual frameworks for understanding agent interactions at scale. Within Collaboration Mechanisms and Communication Protocols, a particularly active line of work focuses on context learning and communication optimization—how agents dynamically adapt their messaging strategies and shared representations to improve collective reasoning. Context Learning for Multi-Agent[0] sits squarely in this cluster, emphasizing methods that let agents refine contextual cues during discussion rounds. Nearby efforts like Talk structurally act hierarchically[32] and Beyond self-talk[37] explore structured communication patterns and richer inter-agent dialogue, highlighting trade-offs between rigid protocols and flexible, emergent exchanges. Other branches investigate debate-driven refinement (e.g., Multi-agent debate strategies[6]) or coordination under resource constraints (e.g., Llm-coordination[8]), raising open questions about when to prioritize consensus versus diversity of viewpoints. The original paper's focus on context learning places it at the intersection of communication design and adaptive optimization, contrasting with works that rely more heavily on predefined interaction templates or external orchestration layers.

Claimed Contributions

Multi-LLM context learning method (M2CL)

The authors propose M2CL, a method that trains context generators for each agent in multi-agent discussion systems. These generators dynamically produce context instructions at each discussion round through automatic information organization and refinement, addressing the problem of discussion inconsistency caused by context misalignment between LLMs.

10 retrieved papers
Lightweight context initialization approach

The authors develop a context initialization method that assigns diverse initial instructions to LLMs. These instructions are approximately orthogonal in latent space, enabling sufficient coverage of complementary solution perspectives and expanding the search space for solutions.

7 retrieved papers
Self-adaptive balancing mechanism for context evolution

The authors devise a self-adaptive mechanism that trains context generators to balance context coherence and output discrepancies. This mechanism enables LLMs to avoid premature convergence on majority noise while progressively reaching correct consensus during multi-round discussions.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Multi-LLM context learning method (M2CL)

The authors propose M2CL, a method that trains context generators for each agent in multi-agent discussion systems. These generators dynamically produce context instructions at each discussion round through automatic information organization and refinement, addressing the problem of discussion inconsistency caused by context misalignment between LLMs.

Contribution

Lightweight context initialization approach

The authors develop a context initialization method that assigns diverse initial instructions to LLMs. These instructions are approximately orthogonal in latent space, enabling sufficient coverage of complementary solution perspectives and expanding the search space for solutions.

Contribution

Self-adaptive balancing mechanism for context evolution

The authors devise a self-adaptive mechanism that trains context generators to balance context coherence and output discrepancies. This mechanism enables LLMs to avoid premature convergence on majority noise while progressively reaching correct consensus during multi-round discussions.