Abstract:

Large language models (LLMs) perform better when scaffolded into agents with memory, tools, and feedback. Beyond this, self-evolving agents have emerged, but current work largely limits adaptation to prompt rewriting or failure retries. Therefore, we present Alita-G, a self-evolution framework that transforms a general-purpose agent into a domain expert by systematically generating, abstracting, and curating Model Context Protocol (MCP) tools. In this framework, a generalist agent executes a curated suite of target-domain tasks and synthesizes candidate MCPs from successful trajectories. These are then abstracted to parameterized primitives and consolidated into a MCP Box. At inference time, Alita-G performs retrieval-augmented MCP selection with the help of each tool’s descriptions and use cases, before executing an agent equipped with the MCP Executor. Across several benchmarks GAIA, PathVQA, and Humanity's Last Exam, Alita-G attains strong gains while reducing computation costs. On GAIA validation, it achieves 83.03% pass@1 and 89.09% pass@3, establishing a new state-of-the-art result while reducing mean tokens per example by approximately 15% relative to a strong baseline agent. Alita-G thus provides a principled pathway from generalist capability to reusable, domain-specific competence, improving both accuracy and efficiency on complex reasoning tasks.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces Alita-G, a framework that transforms general-purpose agents into domain specialists by generating, abstracting, and curating Model Context Protocol tools from successful task trajectories. Within the taxonomy, it resides in the Domain-Specific Tool Synthesis leaf under Tool and Capability Generation. Notably, this leaf contains only one paper—Alita-G itself—indicating a relatively sparse research direction focused specifically on autonomous tool synthesis for domain adaptation. The broader Tool and Capability Generation branch includes just two leaves, suggesting this area is less crowded than Self-Improvement Mechanisms or Application Domains.

The taxonomy reveals that neighboring work primarily falls under Self-Improvement Mechanisms, particularly Prompt and Workflow Evolution, and Application Domains such as Web Interaction and Scientific Research. While Agentic Workflow Optimization addresses multi-step agent design and Generalist Capability Expansion covers action-space learning across diverse tasks, Alita-G diverges by emphasizing tool-level abstraction and retrieval-augmented selection rather than workflow topology or general skill acquisition. The scope_note for Domain-Specific Tool Synthesis explicitly excludes general-purpose capability learning, positioning Alita-G as a specialized approach to agent customization rather than broad self-improvement.

Among thirty candidates examined, the contribution-level analysis shows mixed novelty signals. The core self-evolution framework and MCP abstraction mechanism each examined ten candidates with zero refutations, suggesting these components face limited direct prior work within the search scope. However, the state-of-the-art GAIA performance claim examined ten candidates and found three refutable instances, indicating that performance benchmarks in this space have substantial existing competition. This pattern suggests the methodological contributions appear more novel than the empirical results, though the analysis remains constrained by the top-thirty semantic search scope.

Based on the limited literature search, Alita-G occupies a sparsely populated niche within tool synthesis for domain specialization. The framework's emphasis on MCP generation and retrieval-augmented tool selection appears distinctive among the examined candidates, though the GAIA benchmark results face clearer precedent. The analysis does not cover exhaustive citation networks or broader agent evolution literature beyond the top-thirty matches, leaving open questions about related work in tool learning and meta-learning domains.

Taxonomy

This LLM-generated taxonomy tree may contain errors and therefore requires manual review; it could include omissions or duplicates.
Core-task Taxonomy Papers
50
3
Claimed Contributions
30
Contribution Candidate Papers Compared
3
Refutable Paper

Research Landscape Overview

Core task: Self-evolving generative agent for domain-specific agent generation. The field is organized around five main branches that capture different facets of how agents improve and specialize over time. Self-Improvement Mechanisms and Architectures focuses on foundational techniques for agents to refine their own capabilities, often through iterative learning or recursive introspection. Multi-Agent Evolution and Collaboration examines how populations of agents co-evolve or coordinate to solve complex tasks, as seen in works like Self-Evolving Collaboration[9] and Multi-Agent Evolve[42]. Tool and Capability Generation addresses the synthesis of domain-specific tools, APIs, or skill libraries that agents can leverage or create autonomously. Application Domains and Specialized Systems showcases deployments in areas such as healthcare, web navigation, and scientific research, while Surveys, Foundations, and Conceptual Frameworks provide overarching perspectives on self-evolving systems, including Self-Evolving Survey[7] and Self-Evolving Comprehensive Survey[25]. A particularly active line of work centers on domain-specific tool synthesis, where agents must generate or adapt specialized capabilities for narrow problem settings. This contrasts with broader self-improvement architectures that aim for general-purpose learning loops. Alita-G[0] sits squarely within the Tool and Capability Generation branch, emphasizing the automatic creation of domain-tailored agents rather than generic skill acquisition. Compared to OS-Copilot[1], which targets operating-system-level tool use, or SEW[2], which focuses on web-based evolution, Alita-G[0] prioritizes generating entire agent configurations for specific domains. This approach shares thematic overlap with works like VizGenie[15] and SceneWeaver[23], which also synthesize domain-specific artifacts, yet Alita-G[0] distinguishes itself by framing the agent itself as the generative output. The central trade-off across these lines is between generality and specialization: whether to build broadly capable self-improving systems or to craft highly tuned generators for particular application niches.

Claimed Contributions

ALITA-G self-evolution framework for domain-specialist agent generation

The authors introduce ALITA-G, a framework that automatically converts generalist agents into domain specialists by executing target tasks, harvesting successful MCPs from trajectories, abstracting them into reusable primitives, and organizing them into an MCP Box for retrieval-augmented tool selection at inference time.

10 retrieved papers
MCP abstraction coupled with MCP-level retrieval-augmented generation

The authors claim to be the first to combine MCP abstraction (distilling task-specific MCPs into reusable primitives) with MCP-level RAG (retrieving relevant MCPs at inference time) within a unified framework, yielding accuracy gains while reducing computational cost.

10 retrieved papers
State-of-the-art performance on GAIA with reduced computational cost

The authors report achieving new state-of-the-art results on the GAIA benchmark (83.03% pass@1 and 89.09% pass@3) while reducing mean tokens per example by approximately 15%, demonstrating both improved accuracy and efficiency through their method.

10 retrieved papers
Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

ALITA-G self-evolution framework for domain-specialist agent generation

The authors introduce ALITA-G, a framework that automatically converts generalist agents into domain specialists by executing target tasks, harvesting successful MCPs from trajectories, abstracting them into reusable primitives, and organizing them into an MCP Box for retrieval-augmented tool selection at inference time.

Contribution

MCP abstraction coupled with MCP-level retrieval-augmented generation

The authors claim to be the first to combine MCP abstraction (distilling task-specific MCPs into reusable primitives) with MCP-level RAG (retrieving relevant MCPs at inference time) within a unified framework, yielding accuracy gains while reducing computational cost.

Contribution

State-of-the-art performance on GAIA with reduced computational cost

The authors report achieving new state-of-the-art results on the GAIA benchmark (83.03% pass@1 and 89.09% pass@3) while reducing mean tokens per example by approximately 15%, demonstrating both improved accuracy and efficiency through their method.

Alita-G: Self-Evolving Generative Agent for Agent Generation | Novelty Validation