Agent Data Protocol

ICLR 2026 Conference SubmissionAnonymous Authors
agenttrainingdata
Abstract:

Public research results on large-scale supervised finetuning of AI agents remain relatively rare, since the collection of agent training data presents unique challenges. In this work, we argue that the bottleneck is not a lack of underlying data sources, but that a large variety of data is fragmented across heterogeneous formats, tools, and interfaces. To this end, we introduce the Agent Data Protocol (ADP), a light-weight representation language that serves as an "interlingua" between agent datasets in diverse formats and unified agent training pipelines downstream. The design of ADP is expressive enough to capture a large variety of tasks, including API/tool use, browsing, coding, software engineering, and general agentic workflows, while remaining simple to parse and train on without engineering at a per-dataset level. In experiments, we unified a broad collection of 13 existing agent training datasets into ADP format, and converted the standardized ADP data into training-ready formats for multiple agent frameworks. We performed supervised finetuning on the unified data, and demonstrated an average performance gain of \sim20% over corresponding base models, and delivers state-of-the-art or near-SOTA performance on standard coding, browsing, tool use, and research benchmarks, without domain-specific tuning. All code and data are released publicly, in the hope that ADP could help lower the barrier to standardized, scalable, and reproducible agent training.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces the Agent Data Protocol (ADP), a lightweight interlingua designed to unify heterogeneous agent training datasets across diverse formats and interfaces. Within the taxonomy, it resides in the 'Agent-Specific Interlingua and Pipeline Unification' leaf, which contains three papers total. This leaf sits under the broader 'Unified Data Representation and Protocol Design' branch, indicating a moderately populated research direction focused on standardized schemas for agent data. The taxonomy reveals that protocol-level unification is an active but not overcrowded area, with sibling work like Agent Ohana addressing similar cross-domain challenges.

The taxonomy structure shows that the paper's approach contrasts with neighboring branches such as 'Multi-Source Data Integration for Prediction and Learning', which emphasizes sensor fusion and forecasting rather than training pipeline unification. The 'Fine-Tuning with Heterogeneous Feedback and Data Quality' branch addresses mixed-quality annotations but does not enforce a single representational standard, while 'Embodiment and Action Space Normalization' focuses on low-level motor command alignment. The scope notes clarify that ADP's interlingua design excludes domain-specific normalization or retrieval-augmented generation, positioning it as a protocol-first solution distinct from prediction-centric or embodiment-specific methods.

Among the three contributions analyzed, the Agent Data Protocol itself was examined against ten candidates, with three appearing to provide overlapping prior work. The unified collection of thirteen datasets and the empirical validation each faced ten candidates, with one refutable match per contribution. These statistics reflect a limited search scope of thirty total candidates examined, not an exhaustive literature review. The protocol contribution shows the most substantial prior overlap, suggesting that standardized agent data formats have been explored before, though the specific design choices and scope of ADP may differ. The dataset collection and validation contributions appear more novel within the examined sample.

Based on the top-thirty semantic matches and citation expansion, the work occupies a moderately explored niche within agent training standardization. The taxonomy indicates that while unified data representation is an established direction, the specific interlingua approach for agent pipelines remains relatively sparse. The analysis does not cover all possible prior work in agent learning or data harmonization, and a broader search might reveal additional overlapping efforts. The contribution-level statistics suggest incremental novelty in protocol design, with stronger differentiation in the empirical validation and dataset unification aspects.

Taxonomy

Core-task Taxonomy Papers
11
3
Claimed Contributions
30
Contribution Candidate Papers Compared
5
Refutable Paper

Research Landscape Overview

Core task: Standardizing heterogeneous agent training datasets for supervised fine-tuning. The field addresses the challenge of unifying diverse agent training data—spanning web navigation, embodied robotics, multi-agent coordination, and other domains—into formats suitable for large-scale supervised learning. The taxonomy reveals several complementary directions: Unified Data Representation and Protocol Design focuses on creating common schemas and interlingua that allow disparate agent trajectories to be expressed in a shared format, as exemplified by Agent Data Protocol[0] and Agent Ohana[2]. Multi-Source Data Integration for Prediction and Learning tackles the fusion of heterogeneous sensor streams or behavioral logs to improve predictive models, while Fine-Tuning with Heterogeneous Feedback and Data Quality examines how to leverage mixed-quality annotations and varied reward signals during training. Embodiment and Action Space Normalization deals with aligning low-level motor commands across different robot morphologies, Multi-Agent Stochastic Policy Learning explores coordination under uncertainty, and Model Merging for Fine-Tuned Agent Generalization investigates techniques to combine separately fine-tuned agent policies into a single generalist system. A particularly active line of work centers on protocol-level unification: Agent Data Protocol[0] and its closely related predecessor Agent Data Protocol[3] both propose standardized schemas that enable cross-domain agent datasets to be pooled and reused, reducing the overhead of domain-specific preprocessing. This contrasts with approaches like Heterogeneous Feedback[5], which emphasize handling variable annotation quality rather than enforcing a single representational standard. Meanwhile, Agent Ohana[2] demonstrates how a unified data layer can facilitate large-scale multi-task agent training by harmonizing action spaces and observation formats. The original paper, Agent Data Protocol[0], sits squarely within the Unified Data Representation branch, offering an interlingua that bridges web-based and embodied agent tasks. Compared to Agent Ohana[2], which also targets cross-domain unification, Agent Data Protocol[0] places stronger emphasis on protocol design and pipeline modularity, making it easier to integrate new data sources incrementally. This work addresses a central bottleneck in agent learning: the lack of a common currency for expressing diverse agent experiences in a way that supervised fine-tuning can readily exploit.

Claimed Contributions

Agent Data Protocol (ADP)

ADP is a standardized schema implemented as Pydantic objects that unifies heterogeneous agent training datasets into a common format. It represents agent trajectories as sequences of actions (API, code, message) and observations (text, web), enabling conversion from diverse raw datasets to multiple agent frameworks without per-dataset engineering.

10 retrieved papers
Can Refute
Unified collection of 13 agent training datasets

The authors implemented converters to transform 13 pre-existing agent datasets (covering coding, software engineering, tool use, and web browsing) into the ADP standardized format, and further converted ADP data into training formats for three different agent architectures (OpenHands, SWE-Agent, AgentLab).

10 retrieved papers
Can Refute
ADP Dataset V1 and empirical validation

The authors created and released the largest publicly available agent training dataset (1.3M trajectories) by unifying data through ADP. Supervised fine-tuning experiments show approximately 20% average improvement over base models and achieve state-of-the-art or near-SOTA results across multiple benchmarks without domain-specific tuning.

10 retrieved papers
Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Agent Data Protocol (ADP)

ADP is a standardized schema implemented as Pydantic objects that unifies heterogeneous agent training datasets into a common format. It represents agent trajectories as sequences of actions (API, code, message) and observations (text, web), enabling conversion from diverse raw datasets to multiple agent frameworks without per-dataset engineering.

Contribution

Unified collection of 13 agent training datasets

The authors implemented converters to transform 13 pre-existing agent datasets (covering coding, software engineering, tool use, and web browsing) into the ADP standardized format, and further converted ADP data into training formats for three different agent architectures (OpenHands, SWE-Agent, AgentLab).

Contribution

ADP Dataset V1 and empirical validation

The authors created and released the largest publicly available agent training dataset (1.3M trajectories) by unifying data through ADP. Supervised fine-tuning experiments show approximately 20% average improvement over base models and achieve state-of-the-art or near-SOTA results across multiple benchmarks without domain-specific tuning.