Improving Human-AI Coordination through Online Adversarial Training and Generative Models

ICLR 2026 Conference SubmissionAnonymous Authors
multi agentadversarial trainingzero-shot coordinationhuman-AI interactioncooperationreinforcement learning
Abstract:

Being able to cooperate with diverse humans is an important component of many economically valuable AI tasks, from household robotics to autonomous driving. However, generalizing to novel humans requires training on data that captures the diversity of human behaviors. Adversarial training is a promising method that allows dynamic data generation and ensures that agents are robust. It creates a feedback loop where the agent’s performance influences the generation of new adversarial data, which can be used immediately to train the agent. However, adversarial training is difficult to apply in a cooperative task; how can we train an adversarial cooperator? We propose a novel strategy that combines a pre-trained generative model to simulate valid cooperative agent policies with adversarial training to maximize regret. We call our method \textbf{GOAT}: \textbf{G}enerative \textbf{O}nline \textbf{A}dversarial \textbf{T}raining. In this framework, the GOAT dynamically searches the latent space of the generative model for coordination strategies where the learning policy---the Cooperator agent---underperforms. GOAT enables better generalization by exposing the Cooperator to various challenging interaction scenarios. We maintain realistic coordination strategies by keeping the generative model frozen, thus avoiding adversarial exploitation. We evaluate GOAT with real human partners, and the results demonstrate state-of-the-art performance on the Overcooked benchmark, highlighting its effectiveness in generalizing to diverse human behaviors.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes GOAT, a method combining pre-trained generative models with adversarial training to create challenging cooperative partners that maximize regret during training. This work sits in the 'Adversarial and Generative Training Frameworks' leaf, which contains only two papers total. This is a notably sparse research direction within the broader taxonomy of 50 papers across 33 leaf nodes, suggesting that adversarial approaches to cooperative training remain relatively underexplored compared to self-play methods (which occupy multiple leaves with several papers each) or human data approaches.

The taxonomy reveals that GOAT's leaf sits within 'Training Paradigms for Human-Compatible Cooperation', adjacent to leaves focused on self-play diversity, behavioral cloning from human data, and zero-shot coordination. The sibling self-play approaches emphasize population diversity without adversarial objectives, while human data methods rely on behavioral cloning rather than dynamic adversarial generation. The zero-shot coordination leaf addresses partner unfamiliarity but without the training-time adversarial feedback loop that GOAT employs. This positioning suggests GOAT bridges adversarial robustness (common in competitive settings) with cooperative generalization, a combination that appears less densely populated in the field structure.

Among 30 candidates examined, the contribution-level analysis found limited prior work overlap. The core GOAT method (Contribution 1) examined 10 candidates with 1 appearing to refute; the regret-based adversarial objective (Contribution 2) examined 10 candidates with 2 refutable; and the Overcooked benchmark performance (Contribution 3) examined 10 candidates with 1 refutable. These statistics indicate that within the limited search scope, most contributions face minimal direct prior work, though the regret-based objective shows slightly more overlap. The small number of refutable candidates across contributions suggests the approach occupies a relatively distinct position among the examined papers.

Based on the limited search of 30 semantically similar papers, GOAT appears to occupy a sparse research direction combining adversarial training with cooperative objectives. The taxonomy structure confirms this sparsity, with only one sibling paper in the same leaf. However, this assessment is constrained by the search scope and does not reflect exhaustive coverage of adversarial training literature outside the cooperative AI context or recent work not captured in the semantic search.

Taxonomy

Core-task Taxonomy Papers
49
3
Claimed Contributions
27
Contribution Candidate Papers Compared
4
Refutable Paper

Research Landscape Overview

Core task: Generalizing AI agents to cooperate with diverse human partners. The field addresses how artificial agents can learn to work effectively with a wide variety of human collaborators, each bringing different strategies, preferences, and communication styles. The taxonomy organizes research into several major branches: Training Paradigms for Human-Compatible Cooperation explores how agents are trained to handle partner diversity, including adversarial and generative frameworks that expose agents to varied partner behaviors during learning; Online Adaptation and Partner Modeling Mechanisms focuses on real-time adjustment strategies that allow agents to infer and respond to partner characteristics on the fly; Large Language Model-Based Cooperative Agents examines how foundation models enable flexible communication and reasoning about collaboration; Theoretical Frameworks provide formal models of coordination and convention formation; Empirical Studies investigate actual human-AI teaming outcomes; Application Domains demonstrate task-specific implementations; and Emerging Paradigms capture cross-cutting innovations. Works like Collaborating Without Data[1] and Diverse Conventions[29] illustrate how training diversity shapes generalization, while studies such as Mutual Theory Mind[9] and Bayesian Adaptation Teaming[50] exemplify partner modeling approaches. A central tension runs through the field between training-time diversity and online adaptation: some approaches emphasize exposing agents to many partner types during training (e.g., Partner Diversification Methods[32], Diversity Specialist Partners[7]), while others prioritize rapid inference and adjustment mechanisms that work with minimal prior exposure (e.g., Adaptation Collective Teaming[2], Latent Partner Strategies[44]). Adversarial Training Coordination[0] sits within the adversarial and generative training branch, emphasizing robustness through exposure to challenging partner behaviors during learning—a strategy that contrasts with purely self-play methods (Self Play Pitfalls[40]) and complements work on generative partner simulation like Generative Agents Cooperation[5]. This approach shares the training-diversity philosophy of Adversarial Training Coordination[19] but differs in emphasis from online adaptation methods that assume less about training-time partner coverage. The interplay between these paradigms reflects ongoing questions about whether generalization is best achieved through comprehensive training exposure or through more flexible, adaptive architectures.

Claimed Contributions

GOAT: Generative Online Adversarial Training method

The authors propose GOAT, which combines a pre-trained frozen generative model (VAE) with online regret-based adversarial training. The adversary searches the latent space of the generative model to find challenging cooperative partners that maximize the cooperator's regret, while the generative model ensures all partners remain cooperative and do not engage in sabotage.

9 retrieved papers
Can Refute
Regret-based adversarial objective for cooperative training

The authors formalize regret in the cooperative setting as the performance gap between a partner's self-play performance and its cross-play performance with the cooperator. This objective encourages the adversary to find meaningful partner policies that could perform well but for which the cooperator underperforms, creating a dynamic curriculum without incentivizing sabotage.

9 retrieved papers
Can Refute
State-of-the-art performance on Overcooked benchmark with real humans

The authors conduct live evaluations with 40 real human participants on the Overcooked benchmark, demonstrating that GOAT achieves state-of-the-art cooperation performance compared to five competitive baselines, with particularly strong improvements (38%) on the more complex Multi-Strategy Counter layout.

9 retrieved papers
Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

GOAT: Generative Online Adversarial Training method

The authors propose GOAT, which combines a pre-trained frozen generative model (VAE) with online regret-based adversarial training. The adversary searches the latent space of the generative model to find challenging cooperative partners that maximize the cooperator's regret, while the generative model ensures all partners remain cooperative and do not engage in sabotage.

Contribution

Regret-based adversarial objective for cooperative training

The authors formalize regret in the cooperative setting as the performance gap between a partner's self-play performance and its cross-play performance with the cooperator. This objective encourages the adversary to find meaningful partner policies that could perform well but for which the cooperator underperforms, creating a dynamic curriculum without incentivizing sabotage.

Contribution

State-of-the-art performance on Overcooked benchmark with real humans

The authors conduct live evaluations with 40 real human participants on the Overcooked benchmark, demonstrating that GOAT achieves state-of-the-art cooperation performance compared to five competitive baselines, with particularly strong improvements (38%) on the more complex Multi-Strategy Counter layout.