Alita-G: Self-Evolving Generative Agent for Agent Generation

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

AgentSelf-evolving

Large language models (LLMs) perform better when scaffolded into agents with memory, tools, and feedback. Beyond this, self-evolving agents have emerged, but current work largely limits adaptation to prompt rewriting or failure retries. Therefore, we present Alita-G, a self-evolution framework that transforms a general-purpose agent into a domain expert by systematically generating, abstracting, and curating Model Context Protocol (MCP) tools. In this framework, a generalist agent executes a curated suite of target-domain tasks and synthesizes candidate MCPs from successful trajectories. These are then abstracted to parameterized primitives and consolidated into a MCP Box. At inference time, Alita-G performs retrieval-augmented MCP selection with the help of each tool’s descriptions and use cases, before executing an agent equipped with the MCP Executor. Across several benchmarks GAIA, PathVQA, and Humanity's Last Exam, Alita-G attains strong gains while reducing computation costs. On GAIA validation, it achieves 83.03% pass@1 and 89.09% pass@3, establishing a new state-of-the-art result while reducing mean tokens per example by approximately 15% relative to a strong baseline agent. Alita-G thus provides a principled pathway from generalist capability to reusable, domain-specific competence, improving both accuracy and efficiency on complex reasoning tasks.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces Alita-G, a framework that transforms general-purpose agents into domain specialists by generating, abstracting, and curating Model Context Protocol tools from successful task trajectories. Within the taxonomy, it resides in the Domain-Specific Tool Synthesis leaf under Tool and Capability Generation. Notably, this leaf contains only one paper—Alita-G itself—indicating a relatively sparse research direction focused specifically on autonomous tool synthesis for domain adaptation. The broader Tool and Capability Generation branch includes just two leaves, suggesting this area is less crowded than Self-Improvement Mechanisms or Application Domains.

The taxonomy reveals that neighboring work primarily falls under Self-Improvement Mechanisms, particularly Prompt and Workflow Evolution, and Application Domains such as Web Interaction and Scientific Research. While Agentic Workflow Optimization addresses multi-step agent design and Generalist Capability Expansion covers action-space learning across diverse tasks, Alita-G diverges by emphasizing tool-level abstraction and retrieval-augmented selection rather than workflow topology or general skill acquisition. The scope_note for Domain-Specific Tool Synthesis explicitly excludes general-purpose capability learning, positioning Alita-G as a specialized approach to agent customization rather than broad self-improvement.

Among thirty candidates examined, the contribution-level analysis shows mixed novelty signals. The core self-evolution framework and MCP abstraction mechanism each examined ten candidates with zero refutations, suggesting these components face limited direct prior work within the search scope. However, the state-of-the-art GAIA performance claim examined ten candidates and found three refutable instances, indicating that performance benchmarks in this space have substantial existing competition. This pattern suggests the methodological contributions appear more novel than the empirical results, though the analysis remains constrained by the top-thirty semantic search scope.

Based on the limited literature search, Alita-G occupies a sparsely populated niche within tool synthesis for domain specialization. The framework's emphasis on MCP generation and retrieval-augmented tool selection appears distinctive among the examined candidates, though the GAIA benchmark results face clearer precedent. The analysis does not cover exhaustive citation networks or broader agent evolution literature beyond the top-thirty matches, leaving open questions about related work in tool learning and meta-learning domains.

Taxonomy

This LLM-generated taxonomy tree may contain errors and therefore requires manual review; it could include omissions or duplicates.

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Self-evolving generative agent for domain-specific agent generation. The field is organized around five main branches that capture different facets of how agents improve and specialize over time. Self-Improvement Mechanisms and Architectures focuses on foundational techniques for agents to refine their own capabilities, often through iterative learning or recursive introspection. Multi-Agent Evolution and Collaboration examines how populations of agents co-evolve or coordinate to solve complex tasks, as seen in works like Self-Evolving Collaboration[9] and Multi-Agent Evolve[42]. Tool and Capability Generation addresses the synthesis of domain-specific tools, APIs, or skill libraries that agents can leverage or create autonomously. Application Domains and Specialized Systems showcases deployments in areas such as healthcare, web navigation, and scientific research, while Surveys, Foundations, and Conceptual Frameworks provide overarching perspectives on self-evolving systems, including Self-Evolving Survey[7] and Self-Evolving Comprehensive Survey[25]. A particularly active line of work centers on domain-specific tool synthesis, where agents must generate or adapt specialized capabilities for narrow problem settings. This contrasts with broader self-improvement architectures that aim for general-purpose learning loops. Alita-G[0] sits squarely within the Tool and Capability Generation branch, emphasizing the automatic creation of domain-tailored agents rather than generic skill acquisition. Compared to OS-Copilot[1], which targets operating-system-level tool use, or SEW[2], which focuses on web-based evolution, Alita-G[0] prioritizes generating entire agent configurations for specific domains. This approach shares thematic overlap with works like VizGenie[15] and SceneWeaver[23], which also synthesize domain-specific artifacts, yet Alita-G[0] distinguishes itself by framing the agent itself as the generative output. The central trade-off across these lines is between generality and specialization: whether to build broadly capable self-improving systems or to craft highly tuned generators for particular application niches.

Claimed Contributions

ALITA-G self-evolution framework for domain-specialist agent generation

10 retrieved papers

The authors introduce ALITA-G, a framework that automatically converts generalist agents into domain specialists by executing target tasks, harvesting successful MCPs from trajectories, abstracting them into reusable primitives, and organizing them into an MCP Box for retrieval-augmented tool selection at inference time.

10 retrieved papers

MCP abstraction coupled with MCP-level retrieval-augmented generation

10 retrieved papers

The authors claim to be the first to combine MCP abstraction (distilling task-specific MCPs into reusable primitives) with MCP-level RAG (retrieving relevant MCPs at inference time) within a unified framework, yielding accuracy gains while reducing computational cost.

10 retrieved papers

State-of-the-art performance on GAIA with reduced computational cost

Can Refute

10 retrieved papers

The authors report achieving new state-of-the-art results on the GAIA benchmark (83.03% pass@1 and 89.09% pass@3) while reducing mean tokens per example by approximately 15%, demonstrating both improved accuracy and efficiency through their method.

10 retrieved papers

Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

ALITA-G self-evolution framework for domain-specialist agent generation

[51] OpenAGI: When LLM Meets Domain Experts PDF

Cannot Refute

[52] Biomni: A general-purpose biomedical ai agent PDF

Cannot Refute

[53] Tool learning in the wild: Empowering language models as automatic tool agents PDF

Cannot Refute

[54] SciToolAgent: a knowledge-graph-driven scientific agent for multitool integration PDF

Cannot Refute

[55] Professional Agents--Evolving Large Language Models into Autonomous Experts with Human-Level Competencies PDF

Cannot Refute

[56] The hitchhiker's guide to autonomous research: A survey of scientific agents PDF

Cannot Refute

[57] Agora: A Distributed Language Model framework with API-call Support for Integrated Climate Forecasting PDF

Cannot Refute

[58] Unifying Dynamic Tool Creation and Cross-Task Experience Sharing through Cognitive Memory Architecture PDF

Cannot Refute

[59] Toward Agents of Intelligence: Bridging the AI Expertise Gap in Domain Sciences PDF

Cannot Refute

[60] Towards an Agentic Workflow for Internet Measurement Research PDF

Cannot Refute

Contribution

MCP abstraction coupled with MCP-level retrieval-augmented generation

[61] Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG PDF

Cannot Refute

[62] Chathuman: Language-driven 3d human understanding with retrieval-augmented tool reasoning PDF

Cannot Refute

[63] ARPaCCino: An Agentic-RAG for Policy as Code Compliance PDF

Cannot Refute

[64] Planning and editing what you retrieve for enhanced tool learning PDF

Cannot Refute

[65] AutoAgent: A Fully-Automated and Zero-Code Framework for LLM Agents PDF

Cannot Refute

[66] Everything is Context: Agentic File System Abstraction for Context Engineering PDF

Cannot Refute

[67] Sciagent: Tool-augmented language models for scientific reasoning PDF

Cannot Refute

[68] Graph RAG-Tool Fusion PDF

Cannot Refute

[69] TURA: Tool-Augmented Unified Retrieval Agent for AI Search PDF

Cannot Refute

[70] Retrieval Augmented Generation for Intelligent Querying of Databases and Documents PDF

Cannot Refute

Contribution

State-of-the-art performance on GAIA with reduced computational cost

[71] Efficient agents: Building effective agents while reducing cost PDF

Can Refute

[72] Are: Scaling up agent environments and evaluations PDF

Can Refute

[75] Affordable ai assistants with knowledge graph of thoughts PDF

Can Refute

[73] Co-sight: Enhancing llm-based agents via conflict-aware meta-verification and trustworthy reasoning with structured facts PDF

Cannot Refute

[74] Aworld: Orchestrating the training recipe for agentic ai PDF

Cannot Refute

[76] CodeAgents: A Token-Efficient Framework for Codified Multi-Agent Reasoning in LLMs PDF

Cannot Refute

[77] Divide, Optimize, Merge: Fine-Grained LLM Agent Optimization at Scale PDF

Cannot Refute

[78] Stop Wasting Your Tokens: Towards Efficient Runtime Multi-Agent Systems PDF

Cannot Refute

[79] Multi-Agent Deep Research: Training Multi-Agent Systems with M-GRPO PDF

Cannot Refute

[80] Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models PDF

Cannot Refute

Alita-G: Self-Evolving Generative Agent for Agent Generation

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

Contribution Analysis

ALITA-G self-evolution framework for domain-specialist agent generation

[51] OpenAGI: When LLM Meets Domain Experts PDF

[52] Biomni: A general-purpose biomedical ai agent PDF

[53] Tool learning in the wild: Empowering language models as automatic tool agents PDF

[54] SciToolAgent: a knowledge-graph-driven scientific agent for multitool integration PDF

[55] Professional Agents--Evolving Large Language Models into Autonomous Experts with Human-Level Competencies PDF

[56] The hitchhiker's guide to autonomous research: A survey of scientific agents PDF

[57] Agora: A Distributed Language Model framework with API-call Support for Integrated Climate Forecasting PDF

[58] Unifying Dynamic Tool Creation and Cross-Task Experience Sharing through Cognitive Memory Architecture PDF

[59] Toward Agents of Intelligence: Bridging the AI Expertise Gap in Domain Sciences PDF

[60] Towards an Agentic Workflow for Internet Measurement Research PDF

MCP abstraction coupled with MCP-level retrieval-augmented generation

[61] Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG PDF

[62] Chathuman: Language-driven 3d human understanding with retrieval-augmented tool reasoning PDF

[63] ARPaCCino: An Agentic-RAG for Policy as Code Compliance PDF

[64] Planning and editing what you retrieve for enhanced tool learning PDF

[65] AutoAgent: A Fully-Automated and Zero-Code Framework for LLM Agents PDF

[66] Everything is Context: Agentic File System Abstraction for Context Engineering PDF

[67] Sciagent: Tool-augmented language models for scientific reasoning PDF

[68] Graph RAG-Tool Fusion PDF

[69] TURA: Tool-Augmented Unified Retrieval Agent for AI Search PDF

[70] Retrieval Augmented Generation for Intelligent Querying of Databases and Documents PDF

State-of-the-art performance on GAIA with reduced computational cost

[71] Efficient agents: Building effective agents while reducing cost PDF

[72] Are: Scaling up agent environments and evaluations PDF

[75] Affordable ai assistants with knowledge graph of thoughts PDF

[73] Co-sight: Enhancing llm-based agents via conflict-aware meta-verification and trustworthy reasoning with structured facts PDF

[74] Aworld: Orchestrating the training recipe for agentic ai PDF

[76] CodeAgents: A Token-Efficient Framework for Codified Multi-Agent Reasoning in LLMs PDF

[77] Divide, Optimize, Merge: Fine-Grained LLM Agent Optimization at Scale PDF

[78] Stop Wasting Your Tokens: Towards Efficient Runtime Multi-Agent Systems PDF

[79] Multi-Agent Deep Research: Training Multi-Agent Systems with M-GRPO PDF

[80] Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models PDF

Table of Contents