Aegis: Automated Error Generation and Identification for Multi-Agent Systems

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

Multi-Agent Systems; Failure attribution; Automated data generation; Learning

Large language model based multi-agent systems (MAS) have unlocked significant advancements in tackling complex problems, but their increasing capability introduces a structural fragility that makes them difficult to debug. A key obstacle to improving their reliability is the severe scarcity of large-scale, diverse datasets for error attribution, as existing resources rely on costly and unscalable manual annotation. To address this bottleneck, we introduce Aegis, a novel framework for Automated error generation and attribution for multi-agent systems. Aegis constructs a large dataset of 9,533 trajectories with annotated faulty agents and error modes, covering diverse MAS architectures and task domains. This is achieved using a LLM-based manipulator that can adaptively inject context-aware errors into successful execution trajectories. Leveraging fine-grained labels and the structured arrangement of positive-negative sample pairs, Aegis supports three different learning paradigms: Supervised Fine-Tuning, Reinforcement Learning, and Contrastive Learning. We develop learning methods for each paradigm. Comprehensive experiments show that trained models consistently achieve substantial improvements in error attribution. Notably, several of our fine-tuned LLMs demonstrate performance competitive with or superior to proprietary models an order of magnitude larger, validating our automated data generation framework as a crucial resource for developing more robust and interpretable multi-agent systems.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces Aegis, a framework for automated error generation and attribution in LLM-based multi-agent systems, producing 9,533 annotated trajectories with faulty agents and error modes. Within the taxonomy, it resides in the 'Automated Error Generation and Dataset Construction' leaf under 'Failure Attribution in LLM-Based Multi-Agent Systems'. This leaf contains only two papers total, indicating a relatively sparse research direction. The sibling work focuses on similar dataset construction challenges, suggesting this is an emerging area rather than a crowded subfield.

The taxonomy reveals that Aegis sits within a broader branch addressing failure attribution in LLM-based systems, which includes sibling leaves for trace analysis, counterfactual reasoning, and error pattern recognition. Neighboring branches tackle credit assignment in reinforcement learning (11 leaves, 30+ papers) and blame attribution frameworks (7 leaves), reflecting more mature research directions. Aegis diverges from these by focusing specifically on synthetic error injection for dataset creation rather than post-hoc analysis or reward-based credit assignment, occupying a distinct methodological niche at the intersection of debugging and data generation.

Among 27 candidates examined, the framework contribution shows one refutable candidate out of seven examined, while the dataset contribution (10 candidates examined) and learning methods contribution (10 candidates examined) show no clear refutations. The limited search scope means these statistics reflect top-K semantic matches rather than exhaustive coverage. The framework contribution appears to have the most substantial prior work overlap, whereas the large-scale dataset and multi-paradigm learning methods appear more distinctive within the examined candidate set. The relatively small number of refutable pairs across all contributions suggests moderate novelty given the search constraints.

Based on the limited literature search of 27 candidates, the work appears to occupy a sparsely populated research direction with only one sibling paper in its taxonomy leaf. The framework-level contribution shows some overlap with prior work, while the dataset scale and learning paradigm diversity appear less directly anticipated. However, the analysis covers top-K semantic matches rather than comprehensive field coverage, leaving open questions about related work in adjacent communities or recent preprints.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: error attribution in multi-agent systems. The field divides into several complementary branches that reflect different problem settings and methodological traditions. Failure Attribution in LLM-Based Multi-Agent Systems focuses on diagnosing breakdowns in language-model-driven agents, often through automated error generation and dataset construction, as seen in Aegis[0] and related work on attribution frameworks like Agent Failure Attribution[6]. Credit Assignment in Multi-Agent Reinforcement Learning addresses the classic challenge of distributing reward signals among cooperating or competing agents, employing techniques ranging from counterfactual reasoning (Counterfactual Policy Gradients[14]) to value decomposition and Shapley-based methods (Shapley Coop[15]). Credit Assignment in LLM-Based Multi-Agent Systems merges these traditions by applying credit-assignment ideas to language-model agents, as in Credit with Language Models[5] and LLM Explainable Credit[33]. Meanwhile, Blame and Responsibility Attribution Frameworks and Multi-Agent System Failure Analysis and Taxonomy offer more conceptual or normative perspectives, examining accountability structures (Placing Blame[1], Blame Attribution Accountability[28]) and taxonomies of failure modes (Why Systems Fail[4]). Multi-Agent System Design and Evaluation rounds out the landscape with broader architectural and benchmarking concerns. Recent work highlights a tension between model-free credit-assignment heuristics and more interpretable, causality-driven approaches. Many studies in the reinforcement-learning branch pursue implicit or gradient-based methods (Implicit Credit Assignment[2], Improving Credit Assignment[3]), while newer LLM-oriented efforts emphasize explainability and traceability (AgenTracer[22], Role Specialized Traceability[30]). Aegis[0] sits squarely within the Failure Attribution in LLM-Based Multi-Agent Systems branch, specifically targeting automated error generation to build datasets for diagnosing agent failures. Its emphasis on systematic error construction contrasts with neighboring attribution frameworks like Aegis Attribution[32], which may focus more on post-hoc analysis, and aligns with the broader push toward scalable, data-driven diagnostics in language-agent systems. This positioning reflects an emerging consensus that robust multi-agent systems require not only effective credit assignment during training but also principled failure-attribution mechanisms for debugging and accountability.

Claimed Contributions

Aegis framework for automated error generation and attribution in multi-agent systems

Can Refute

7 retrieved papers

The authors propose Aegis, a framework that automatically generates error trajectories by injecting context-aware errors into successful multi-agent executions and programmatically labels faulty agents and error modes. This converts the manual annotation bottleneck into a scalable engineering problem.

7 retrieved papers

Can Refute

Large-scale dataset of 9,533 annotated error trajectories

10 retrieved papers

The authors build a dataset substantially larger than prior resources, spanning six multi-agent system frameworks and six task domains. The dataset includes fine-grained labels and positive-negative sample pairs that enable multiple learning paradigms.

10 retrieved papers

Learning methods across three paradigms for error attribution

10 retrieved papers

The authors develop and validate learning methods for supervised fine-tuning, reinforcement learning with hierarchical rewards, and contrastive learning. These methods leverage the unique structure of the Aegis dataset to train models for error attribution in multi-agent systems.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[32] Aegis: Automated Error Generation and Attribution for Multi-Agent Systems PDF

Kong Fanqi, Zhang Ruijie, Zhang Gui-bin, Zhang Xiaofei, Chen Ziang, Zhang, Zhaowei, Zhang Xiaoyuan, Zhu, Song-Chun, Feng Xue (2025)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Aegis framework for automated error generation and attribution in multi-agent systems

[22] AgenTracer: Who Is Inducing Failure in the LLM Agentic Systems? PDF

Can Refute

[32] Aegis: Automated Error Generation and Attribution for Multi-Agent Systems PDF

Cannot Refute

[42] CORRECT: COndensed eRror RECognition via knowledge Transfer in multi-agent systems PDF

Cannot Refute

[51] On the resilience of llm-based multi-agent collaboration with faulty agents PDF

Cannot Refute

[52] Process-Level Trajectory Evaluation for Environment Configuration in Software Engineering Agents PDF

Cannot Refute

[53] On the resilience of multi-agent systems with malicious agents PDF

Cannot Refute

[54] Modeling the behavior of persons with mild cognitive impairment or Alzheimer's for intelligent environment simulation. PDF

Cannot Refute

Contribution

Large-scale dataset of 9,533 annotated error trajectories

[22] AgenTracer: Who Is Inducing Failure in the LLM Agentic Systems? PDF

Cannot Refute

[55] Trial and Error: Exploration-Based Trajectory Optimization for LLM Agents PDF

Cannot Refute

[56] Magentic-one: A generalist multi-agent system for solving complex tasks PDF

Cannot Refute

[57] From correctness to comprehension: Ai agents for personalized error diagnosis in education PDF

Cannot Refute

[58] Cares: Collaborative agentic reasoning for error detection in surgery PDF

Cannot Refute

[59] Mathagent: Leveraging a mixture-of-math-agent framework for real-world multimodal mathematical error detection PDF

Cannot Refute

[60] Bel esprit: Multi-agent framework for building ai model pipelines PDF

Cannot Refute

[61] An Empirical Study on Failures in Automated Issue Solving PDF

Cannot Refute

[62] Conformal Data-driven Control of Stochastic Multi-Agent Systems under Collaborative Signal Temporal Logic Specifications PDF

Cannot Refute

[63] Anomaly Detection in Multi-Agent Trajectories for Automated Driving PDF

Cannot Refute

Contribution

Learning methods across three paradigms for error attribution

[32] Aegis: Automated Error Generation and Attribution for Multi-Agent Systems PDF

Cannot Refute

[64] RLCD: Reinforcement learning from contrastive distillation for LM alignment PDF

Cannot Refute

[65] Enabling scalable oversight via self-evolving critic PDF

Cannot Refute

[66] An adaptive fault diagnosis framework under class-imbalanced conditions based on contrastive augmented deep reinforcement learning PDF

Cannot Refute

[67] An exploratory journey of representation learning's enhancement, adaptation and related intelligent methods PDF

Cannot Refute

[68] Cal-dpo: Calibrated direct preference optimization for language model alignment PDF

Cannot Refute

[69] CroTad: A Contrastive Reinforcement Learning Framework for Online Trajectory Anomaly Detection PDF

Cannot Refute

[70] SupConWI-RL: wafer inspection with reinforcement learning enhanced by supervised contrastive learning PDF

Cannot Refute

[71] A Contrastive Feedback Loops-Based Unified Framework For Generating Self-Improving LLM Agents Using PL-RL PDF

Cannot Refute

[72] Human in the Loop Learning Through Gaze Augmentation PDF

Cannot Refute

Aegis: Automated Error Generation and Identification for Multi-Agent Systems

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[32] Aegis: Automated Error Generation and Attribution for Multi-Agent Systems PDF

Contribution Analysis

Aegis framework for automated error generation and attribution in multi-agent systems

[22] AgenTracer: Who Is Inducing Failure in the LLM Agentic Systems? PDF

[32] Aegis: Automated Error Generation and Attribution for Multi-Agent Systems PDF

[42] CORRECT: COndensed eRror RECognition via knowledge Transfer in multi-agent systems PDF

[51] On the resilience of llm-based multi-agent collaboration with faulty agents PDF

[52] Process-Level Trajectory Evaluation for Environment Configuration in Software Engineering Agents PDF

[53] On the resilience of multi-agent systems with malicious agents PDF

[54] Modeling the behavior of persons with mild cognitive impairment or Alzheimer's for intelligent environment simulation. PDF

Large-scale dataset of 9,533 annotated error trajectories

[22] AgenTracer: Who Is Inducing Failure in the LLM Agentic Systems? PDF

[55] Trial and Error: Exploration-Based Trajectory Optimization for LLM Agents PDF

[56] Magentic-one: A generalist multi-agent system for solving complex tasks PDF

[57] From correctness to comprehension: Ai agents for personalized error diagnosis in education PDF

[58] Cares: Collaborative agentic reasoning for error detection in surgery PDF

[59] Mathagent: Leveraging a mixture-of-math-agent framework for real-world multimodal mathematical error detection PDF

[60] Bel esprit: Multi-agent framework for building ai model pipelines PDF

[61] An Empirical Study on Failures in Automated Issue Solving PDF

[62] Conformal Data-driven Control of Stochastic Multi-Agent Systems under Collaborative Signal Temporal Logic Specifications PDF

[63] Anomaly Detection in Multi-Agent Trajectories for Automated Driving PDF

Learning methods across three paradigms for error attribution

[32] Aegis: Automated Error Generation and Attribution for Multi-Agent Systems PDF

[64] RLCD: Reinforcement learning from contrastive distillation for LM alignment PDF

[65] Enabling scalable oversight via self-evolving critic PDF

[66] An adaptive fault diagnosis framework under class-imbalanced conditions based on contrastive augmented deep reinforcement learning PDF

[67] An exploratory journey of representation learning's enhancement, adaptation and related intelligent methods PDF

[68] Cal-dpo: Calibrated direct preference optimization for language model alignment PDF

[69] CroTad: A Contrastive Reinforcement Learning Framework for Online Trajectory Anomaly Detection PDF

[70] SupConWI-RL: wafer inspection with reinforcement learning enhanced by supervised contrastive learning PDF

[71] A Contrastive Feedback Loops-Based Unified Framework For Generating Self-Improving LLM Agents Using PL-RL PDF

[72] Human in the Loop Learning Through Gaze Augmentation PDF

Table of Contents