Benefits and Limitations of Communication in Multi-Agent Reasoning

ICLR 2026 Conference SubmissionAnonymous Authors
chain-of-thought promptingmulti-agent systemsreasoningexpressivityinter-agent communicationscalabilitylarge language modelsalgorithmic analysis
Abstract:

Chain-of-thought prompting has popularized step-by-step reasoning in large language models, yet model performance still degrades as problem complexity and context length grow. By decomposing difficult tasks with long contexts into shorter, manageable ones, recent multi-agent paradigms offer a promising near-term solution to this problem. However, the fundamental capacities of such systems are poorly understood. In this work, we propose a theoretical framework to analyze the expressivity of multi-agent systems. We apply our framework to three algorithmic families: state tracking, recall, and kk-hop reasoning. We derive bounds on (i) the number of agents required to solve the task exactly, (ii) the quantity and structure of inter-agent communication, and (iii) the achievable speedups as problem size and context scale. Our results identify regimes where communication is provably beneficial, delineate tradeoffs between agent count and bandwidth, and expose intrinsic limitations when either resource is constrained. We complement our theoretical analysis with a set of experiments on pretrained LLMs using controlled synthetic benchmarks. Empirical outcomes confirm the tradeoffs between key quantities predicted by our theory. Collectively, our analysis offers principled guidance for designing scalable multi-agent reasoning systems.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper contributes a theoretical framework for analyzing multi-agent system expressivity under communication constraints, deriving bounds on agent count, communication structure, and achievable speedups for state tracking, recall, and k-hop reasoning tasks. It occupies the 'Formal Expressivity and Complexity Bounds' leaf within the 'Theoretical Foundations and Expressivity Analysis' branch. Notably, this leaf contains only the original paper itself—no sibling papers appear in the taxonomy—indicating a sparse research direction focused on formal complexity analysis rather than empirical system design or applied algorithm development.

The taxonomy reveals that neighboring leaves address verification and strategic properties under imperfect information, while sibling branches explore consensus control, multi-agent reinforcement learning with communication, and planning-based coordination. The original paper diverges from these directions by prioritizing formal expressivity bounds over protocol design or convergence guarantees. Its scope excludes empirical validation without formal analysis and applied algorithm implementations, positioning it as foundational theory rather than a method for practical deployment. This boundary distinguishes it from nearby work on learned communication protocols or control-theoretic consensus algorithms.

Among thirty candidates examined through semantic search and citation expansion, none were found to refute any of the three core contributions. The first contribution—a theoretical framework for expressivity analysis—examined ten candidates with zero refutable overlaps. The second contribution—bounds on agents, communication, and speedups—similarly examined ten candidates with no prior work providing overlapping results. The third contribution—empirical validation using controlled synthetic benchmarks—also examined ten candidates without refutation. This suggests that within the limited search scope, the formal complexity perspective on multi-agent reasoning appears relatively unexplored, though the search does not claim exhaustive coverage of all relevant literature.

Based on the top-thirty semantic matches and the taxonomy structure, the work appears to occupy a novel position at the intersection of formal complexity theory and multi-agent systems. The absence of sibling papers in its taxonomy leaf and the lack of refutable prior work among examined candidates suggest limited direct competition within the analyzed scope. However, the search scale remains modest, and broader literature on distributed algorithms or communication complexity outside the examined set may contain relevant theoretical results not captured here.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
30
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: multi-agent reasoning with communication constraints. The field addresses how groups of autonomous agents coordinate and make decisions when their ability to exchange information is limited by bandwidth, topology, or timing restrictions. The taxonomy reveals several major branches: Theoretical Foundations and Expressivity Analysis examines formal bounds on what agents can achieve under various communication models; Consensus and Formation Control focuses on distributed algorithms for agreement and geometric coordination; Multi-Agent Reinforcement Learning with Communication studies learned protocols that adapt message-passing strategies; Planning and Task Coordination develops methods for joint action selection under partial observability; Large Language Model-Based Multi-Agent Systems explores reasoning architectures built on foundation models; Domain-Specific Applications and Systems applies these ideas to robotics, power grids, and other real-world settings; and Surveys and General Frameworks provide overviews and unifying perspectives. Representative works like Graph Attention Teaming[3] and Deep RL Communication Survey[2] illustrate how learning-based approaches handle dynamic communication graphs, while Consensus Communication Constraints[1] exemplifies classical control-theoretic methods. Several active lines of work highlight key trade-offs between communication efficiency and coordination quality. Learning-based methods such as Attentional Communication[22] and Information Bottleneck Communication[28] seek to compress messages while preserving task-relevant information, whereas control-oriented approaches like Adaptive Consensus Tracking[12] and Asynchronous Consensus Dynamics[34] ensure convergence guarantees under intermittent or delayed exchanges. Communication Benefits Limitations[0] sits within the Theoretical Foundations branch, specifically addressing formal expressivity and complexity bounds. Its emphasis on characterizing fundamental limits contrasts with nearby empirical studies like Graph Attention Teaming[3], which demonstrates learned communication in practice, and complements formal planning work such as Bounded Communication Planning[46], which explores algorithmic feasibility under strict message budgets. Open questions remain around bridging provable guarantees with scalable learning and adapting theoretical insights to heterogeneous, real-world multi-agent deployments.

Claimed Contributions

Theoretical framework for analyzing multi-agent system expressivity

The authors introduce a formalization of multi-agent reasoning systems grounded in Transformer expressivity literature. This framework enables rigorous analysis of communication protocols, agent count requirements, and computational tradeoffs in collaborative reasoning systems.

10 retrieved papers
Bounds on agents, communication, and speedups for three algorithmic families

For state tracking, recall, and k-hop reasoning tasks, the authors establish theoretical bounds characterizing the minimum number of agents needed, the communication overhead required, and the potential speedups achievable. These bounds reveal fundamental tradeoffs between agent count and communication bandwidth.

10 retrieved papers
Empirical validation using controlled synthetic benchmarks

The authors implement optimal communication protocols derived from theory and test them on pretrained language models. Experiments confirm that empirical performance in accuracy, communication cost, and token usage aligns closely with theoretical predictions, validating the framework.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Theoretical framework for analyzing multi-agent system expressivity

The authors introduce a formalization of multi-agent reasoning systems grounded in Transformer expressivity literature. This framework enables rigorous analysis of communication protocols, agent count requirements, and computational tradeoffs in collaborative reasoning systems.

Contribution

Bounds on agents, communication, and speedups for three algorithmic families

For state tracking, recall, and k-hop reasoning tasks, the authors establish theoretical bounds characterizing the minimum number of agents needed, the communication overhead required, and the potential speedups achievable. These bounds reveal fundamental tradeoffs between agent count and communication bandwidth.

Contribution

Empirical validation using controlled synthetic benchmarks

The authors implement optimal communication protocols derived from theory and test them on pretrained language models. Experiments confirm that empirical performance in accuracy, communication cost, and token usage aligns closely with theoretical predictions, validating the framework.