Aegis: Automated Error Generation and Identification for Multi-Agent Systems

ICLR 2026 Conference SubmissionAnonymous Authors
Multi-Agent Systems; Failure attribution; Automated data generation; Learning
Abstract:

Large language model based multi-agent systems (MAS) have unlocked significant advancements in tackling complex problems, but their increasing capability introduces a structural fragility that makes them difficult to debug. A key obstacle to improving their reliability is the severe scarcity of large-scale, diverse datasets for error attribution, as existing resources rely on costly and unscalable manual annotation. To address this bottleneck, we introduce Aegis, a novel framework for Automated error generation and attribution for multi-agent systems. Aegis constructs a large dataset of 9,533 trajectories with annotated faulty agents and error modes, covering diverse MAS architectures and task domains. This is achieved using a LLM-based manipulator that can adaptively inject context-aware errors into successful execution trajectories. Leveraging fine-grained labels and the structured arrangement of positive-negative sample pairs, Aegis supports three different learning paradigms: Supervised Fine-Tuning, Reinforcement Learning, and Contrastive Learning. We develop learning methods for each paradigm. Comprehensive experiments show that trained models consistently achieve substantial improvements in error attribution. Notably, several of our fine-tuned LLMs demonstrate performance competitive with or superior to proprietary models an order of magnitude larger, validating our automated data generation framework as a crucial resource for developing more robust and interpretable multi-agent systems.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces Aegis, a framework for automated error generation and attribution in LLM-based multi-agent systems, producing 9,533 annotated trajectories with faulty agents and error modes. Within the taxonomy, it resides in the 'Automated Error Generation and Dataset Construction' leaf under 'Failure Attribution in LLM-Based Multi-Agent Systems'. This leaf contains only two papers total, indicating a relatively sparse research direction. The sibling work focuses on similar dataset construction challenges, suggesting this is an emerging area rather than a crowded subfield.

The taxonomy reveals that Aegis sits within a broader branch addressing failure attribution in LLM-based systems, which includes sibling leaves for trace analysis, counterfactual reasoning, and error pattern recognition. Neighboring branches tackle credit assignment in reinforcement learning (11 leaves, 30+ papers) and blame attribution frameworks (7 leaves), reflecting more mature research directions. Aegis diverges from these by focusing specifically on synthetic error injection for dataset creation rather than post-hoc analysis or reward-based credit assignment, occupying a distinct methodological niche at the intersection of debugging and data generation.

Among 27 candidates examined, the framework contribution shows one refutable candidate out of seven examined, while the dataset contribution (10 candidates examined) and learning methods contribution (10 candidates examined) show no clear refutations. The limited search scope means these statistics reflect top-K semantic matches rather than exhaustive coverage. The framework contribution appears to have the most substantial prior work overlap, whereas the large-scale dataset and multi-paradigm learning methods appear more distinctive within the examined candidate set. The relatively small number of refutable pairs across all contributions suggests moderate novelty given the search constraints.

Based on the limited literature search of 27 candidates, the work appears to occupy a sparsely populated research direction with only one sibling paper in its taxonomy leaf. The framework-level contribution shows some overlap with prior work, while the dataset scale and learning paradigm diversity appear less directly anticipated. However, the analysis covers top-K semantic matches rather than comprehensive field coverage, leaving open questions about related work in adjacent communities or recent preprints.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
27
Contribution Candidate Papers Compared
1
Refutable Paper

Research Landscape Overview

Core task: error attribution in multi-agent systems. The field divides into several complementary branches that reflect different problem settings and methodological traditions. Failure Attribution in LLM-Based Multi-Agent Systems focuses on diagnosing breakdowns in language-model-driven agents, often through automated error generation and dataset construction, as seen in Aegis[0] and related work on attribution frameworks like Agent Failure Attribution[6]. Credit Assignment in Multi-Agent Reinforcement Learning addresses the classic challenge of distributing reward signals among cooperating or competing agents, employing techniques ranging from counterfactual reasoning (Counterfactual Policy Gradients[14]) to value decomposition and Shapley-based methods (Shapley Coop[15]). Credit Assignment in LLM-Based Multi-Agent Systems merges these traditions by applying credit-assignment ideas to language-model agents, as in Credit with Language Models[5] and LLM Explainable Credit[33]. Meanwhile, Blame and Responsibility Attribution Frameworks and Multi-Agent System Failure Analysis and Taxonomy offer more conceptual or normative perspectives, examining accountability structures (Placing Blame[1], Blame Attribution Accountability[28]) and taxonomies of failure modes (Why Systems Fail[4]). Multi-Agent System Design and Evaluation rounds out the landscape with broader architectural and benchmarking concerns. Recent work highlights a tension between model-free credit-assignment heuristics and more interpretable, causality-driven approaches. Many studies in the reinforcement-learning branch pursue implicit or gradient-based methods (Implicit Credit Assignment[2], Improving Credit Assignment[3]), while newer LLM-oriented efforts emphasize explainability and traceability (AgenTracer[22], Role Specialized Traceability[30]). Aegis[0] sits squarely within the Failure Attribution in LLM-Based Multi-Agent Systems branch, specifically targeting automated error generation to build datasets for diagnosing agent failures. Its emphasis on systematic error construction contrasts with neighboring attribution frameworks like Aegis Attribution[32], which may focus more on post-hoc analysis, and aligns with the broader push toward scalable, data-driven diagnostics in language-agent systems. This positioning reflects an emerging consensus that robust multi-agent systems require not only effective credit assignment during training but also principled failure-attribution mechanisms for debugging and accountability.

Claimed Contributions

Aegis framework for automated error generation and attribution in multi-agent systems

The authors propose Aegis, a framework that automatically generates error trajectories by injecting context-aware errors into successful multi-agent executions and programmatically labels faulty agents and error modes. This converts the manual annotation bottleneck into a scalable engineering problem.

7 retrieved papers
Can Refute
Large-scale dataset of 9,533 annotated error trajectories

The authors build a dataset substantially larger than prior resources, spanning six multi-agent system frameworks and six task domains. The dataset includes fine-grained labels and positive-negative sample pairs that enable multiple learning paradigms.

10 retrieved papers
Learning methods across three paradigms for error attribution

The authors develop and validate learning methods for supervised fine-tuning, reinforcement learning with hierarchical rewards, and contrastive learning. These methods leverage the unique structure of the Aegis dataset to train models for error attribution in multi-agent systems.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Aegis framework for automated error generation and attribution in multi-agent systems

The authors propose Aegis, a framework that automatically generates error trajectories by injecting context-aware errors into successful multi-agent executions and programmatically labels faulty agents and error modes. This converts the manual annotation bottleneck into a scalable engineering problem.

Contribution

Large-scale dataset of 9,533 annotated error trajectories

The authors build a dataset substantially larger than prior resources, spanning six multi-agent system frameworks and six task domains. The dataset includes fine-grained labels and positive-negative sample pairs that enable multiple learning paradigms.

Contribution

Learning methods across three paradigms for error attribution

The authors develop and validate learning methods for supervised fine-tuning, reinforcement learning with hierarchical rewards, and contrastive learning. These methods leverage the unique structure of the Aegis dataset to train models for error attribution in multi-agent systems.