Improving Human-AI Coordination through Online Adversarial Training and Generative Models

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 7.3 Download Report PDF

multi agentadversarial trainingzero-shot coordinationhuman-AI interactioncooperationreinforcement learning

Being able to cooperate with diverse humans is an important component of many economically valuable AI tasks, from household robotics to autonomous driving. However, generalizing to novel humans requires training on data that captures the diversity of human behaviors. Adversarial training is a promising method that allows dynamic data generation and ensures that agents are robust. It creates a feedback loop where the agent’s performance influences the generation of new adversarial data, which can be used immediately to train the agent. However, adversarial training is difficult to apply in a cooperative task; how can we train an adversarial cooperator? We propose a novel strategy that combines a pre-trained generative model to simulate valid cooperative agent policies with adversarial training to maximize regret. We call our method \textbf{GOAT}: \textbf{G}enerative \textbf{O}nline \textbf{A}dversarial \textbf{T}raining. In this framework, the GOAT dynamically searches the latent space of the generative model for coordination strategies where the learning policy---the Cooperator agent---underperforms. GOAT enables better generalization by exposing the Cooperator to various challenging interaction scenarios. We maintain realistic coordination strategies by keeping the generative model frozen, thus avoiding adversarial exploitation. We evaluate GOAT with real human partners, and the results demonstrate state-of-the-art performance on the Overcooked benchmark, highlighting its effectiveness in generalizing to diverse human behaviors.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes GOAT, a method combining pre-trained generative models with adversarial training to create challenging cooperative partners that maximize regret during training. This work sits in the 'Adversarial and Generative Training Frameworks' leaf, which contains only two papers total. This is a notably sparse research direction within the broader taxonomy of 50 papers across 33 leaf nodes, suggesting that adversarial approaches to cooperative training remain relatively underexplored compared to self-play methods (which occupy multiple leaves with several papers each) or human data approaches.

The taxonomy reveals that GOAT's leaf sits within 'Training Paradigms for Human-Compatible Cooperation', adjacent to leaves focused on self-play diversity, behavioral cloning from human data, and zero-shot coordination. The sibling self-play approaches emphasize population diversity without adversarial objectives, while human data methods rely on behavioral cloning rather than dynamic adversarial generation. The zero-shot coordination leaf addresses partner unfamiliarity but without the training-time adversarial feedback loop that GOAT employs. This positioning suggests GOAT bridges adversarial robustness (common in competitive settings) with cooperative generalization, a combination that appears less densely populated in the field structure.

Among 30 candidates examined, the contribution-level analysis found limited prior work overlap. The core GOAT method (Contribution 1) examined 10 candidates with 1 appearing to refute; the regret-based adversarial objective (Contribution 2) examined 10 candidates with 2 refutable; and the Overcooked benchmark performance (Contribution 3) examined 10 candidates with 1 refutable. These statistics indicate that within the limited search scope, most contributions face minimal direct prior work, though the regret-based objective shows slightly more overlap. The small number of refutable candidates across contributions suggests the approach occupies a relatively distinct position among the examined papers.

Based on the limited search of 30 semantically similar papers, GOAT appears to occupy a sparse research direction combining adversarial training with cooperative objectives. The taxonomy structure confirms this sparsity, with only one sibling paper in the same leaf. However, this assessment is constrained by the search scope and does not reflect exhaustive coverage of adversarial training literature outside the cooperative AI context or recent work not captured in the semantic search.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Generalizing AI agents to cooperate with diverse human partners. The field addresses how artificial agents can learn to work effectively with a wide variety of human collaborators, each bringing different strategies, preferences, and communication styles. The taxonomy organizes research into several major branches: Training Paradigms for Human-Compatible Cooperation explores how agents are trained to handle partner diversity, including adversarial and generative frameworks that expose agents to varied partner behaviors during learning; Online Adaptation and Partner Modeling Mechanisms focuses on real-time adjustment strategies that allow agents to infer and respond to partner characteristics on the fly; Large Language Model-Based Cooperative Agents examines how foundation models enable flexible communication and reasoning about collaboration; Theoretical Frameworks provide formal models of coordination and convention formation; Empirical Studies investigate actual human-AI teaming outcomes; Application Domains demonstrate task-specific implementations; and Emerging Paradigms capture cross-cutting innovations. Works like Collaborating Without Data[1] and Diverse Conventions[29] illustrate how training diversity shapes generalization, while studies such as Mutual Theory Mind[9] and Bayesian Adaptation Teaming[50] exemplify partner modeling approaches. A central tension runs through the field between training-time diversity and online adaptation: some approaches emphasize exposing agents to many partner types during training (e.g., Partner Diversification Methods[32], Diversity Specialist Partners[7]), while others prioritize rapid inference and adjustment mechanisms that work with minimal prior exposure (e.g., Adaptation Collective Teaming[2], Latent Partner Strategies[44]). Adversarial Training Coordination[0] sits within the adversarial and generative training branch, emphasizing robustness through exposure to challenging partner behaviors during learning—a strategy that contrasts with purely self-play methods (Self Play Pitfalls[40]) and complements work on generative partner simulation like Generative Agents Cooperation[5]. This approach shares the training-diversity philosophy of Adversarial Training Coordination[19] but differs in emphasis from online adaptation methods that assume less about training-time partner coverage. The interplay between these paradigms reflects ongoing questions about whether generalization is best achieved through comprehensive training exposure or through more flexible, adaptive architectures.

Claimed Contributions

GOAT: Generative Online Adversarial Training method

Can Refute

9 retrieved papers

The authors propose GOAT, which combines a pre-trained frozen generative model (VAE) with online regret-based adversarial training. The adversary searches the latent space of the generative model to find challenging cooperative partners that maximize the cooperator's regret, while the generative model ensures all partners remain cooperative and do not engage in sabotage.

9 retrieved papers

Can Refute

Regret-based adversarial objective for cooperative training

Can Refute

9 retrieved papers

The authors formalize regret in the cooperative setting as the performance gap between a partner's self-play performance and its cross-play performance with the cooperator. This objective encourages the adversary to find meaningful partner policies that could perform well but for which the cooperator underperforms, creating a dynamic curriculum without incentivizing sabotage.

9 retrieved papers

Can Refute

State-of-the-art performance on Overcooked benchmark with real humans

Can Refute

9 retrieved papers

The authors conduct live evaluations with 40 real human participants on the Overcooked benchmark, demonstrating that GOAT achieves state-of-the-art cooperation performance compared to five competitive baselines, with particularly strong improvements (38%) on the more complex Multi-Strategy Counter layout.

9 retrieved papers

Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

GOAT: Generative Online Adversarial Training method

[66] Emergent complexity and zero-shot transfer via unsupervised environment design PDF

Can Refute

[17] Reinforcement Learning for Human-AI Collaboration: Challenges, Mechanisms, and Methods PDF

Cannot Refute

[65] Ibgp: Imperfect byzantine generals problem for zero-shot robustness in communicative multi-agent systems PDF

Cannot Refute

[67] Jailjudge: A comprehensive jailbreak judge benchmark with multi-agent enhanced explanation evaluation framework PDF

Cannot Refute

[68] Competitive Learning in Embodied Multi-agent System PDF

Cannot Refute

[69] CAMF: Collaborative Adversarial Multi-agent Framework for Machine Generated Text Detection PDF

Cannot Refute

[70] Zero-shot autonomous vehicle policy transfer: From simulation to real-world via adversarial learning PDF

Cannot Refute

[71] Robust and Diverse Multi-Agent Learning via Rational Policy Gradient PDF

Cannot Refute

[72] Meta-Reinforcement Learning for Emergent Multi-Agent Languages in Zero-Shot Coordination Tasks PDF

Cannot Refute

Contribution

Regret-based adversarial objective for cooperative training

[58] RACCOON: Regret-based Adaptive Curricula for Cooperation PDF

Can Refute

[59] ROTATE: Regret-driven Open-ended Training for Ad Hoc Teamwork PDF

Can Refute

[56] Genetic algorithm for curriculum design in multi-agent reinforcement learning PDF

Cannot Refute

[57] MAESTRO: Open-ended environment design for multi-agent reinforcement learning PDF

Cannot Refute

[60] Towards skilled population curriculum for multi-agent reinforcement learning PDF

Cannot Refute

[61] Inducing cooperation via team regret minimization based multi-agent deep reinforcement learning PDF

Cannot Refute

[62] It takes four to tango: Multiagent selfplay for automatic curriculum generation PDF

Cannot Refute

[63] Adaptive regret minimization for learning complex team-based tactics PDF

Cannot Refute

[64] Regret-minimization algorithms for multi-agent cooperative learning systems PDF

Cannot Refute

Contribution

State-of-the-art performance on Overcooked benchmark with real humans

[5] Learning to cooperate with humans using generative agents PDF

Can Refute

[17] Reinforcement Learning for Human-AI Collaboration: Challenges, Mechanisms, and Methods PDF

Cannot Refute

[21] Tackling cooperative incompatibility for zero-shot human-ai coordination PDF

Cannot Refute

[50] Learning zero-shot cooperation with humans, assuming humans are biased PDF

Cannot Refute

[51] Implicitly Aligning Humans and Autonomous Agents through Shared Task Abstractions PDF

Cannot Refute

[52] Designs for Enabling Collaboration in Human-Machine Teaming via Interactive and Explainable Systems PDF

Cannot Refute

[53] Unsupervised partner design enables robust ad-hoc teamwork PDF

Cannot Refute

[54] Efficient human-ai coordination via preparatory language-based convention PDF

Cannot Refute

[55] Action Over Words: Predicting Human Trust in AI Partners Through Gameplay Behaviors PDF

Cannot Refute

Improving Human-AI Coordination through Online Adversarial Training and Generative Models

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

Contribution Analysis

GOAT: Generative Online Adversarial Training method

[66] Emergent complexity and zero-shot transfer via unsupervised environment design PDF

[17] Reinforcement Learning for Human-AI Collaboration: Challenges, Mechanisms, and Methods PDF

[65] Ibgp: Imperfect byzantine generals problem for zero-shot robustness in communicative multi-agent systems PDF

[67] Jailjudge: A comprehensive jailbreak judge benchmark with multi-agent enhanced explanation evaluation framework PDF

[68] Competitive Learning in Embodied Multi-agent System PDF

[69] CAMF: Collaborative Adversarial Multi-agent Framework for Machine Generated Text Detection PDF

[70] Zero-shot autonomous vehicle policy transfer: From simulation to real-world via adversarial learning PDF

[71] Robust and Diverse Multi-Agent Learning via Rational Policy Gradient PDF

[72] Meta-Reinforcement Learning for Emergent Multi-Agent Languages in Zero-Shot Coordination Tasks PDF

Regret-based adversarial objective for cooperative training

[58] RACCOON: Regret-based Adaptive Curricula for Cooperation PDF

[59] ROTATE: Regret-driven Open-ended Training for Ad Hoc Teamwork PDF

[56] Genetic algorithm for curriculum design in multi-agent reinforcement learning PDF

[57] MAESTRO: Open-ended environment design for multi-agent reinforcement learning PDF

[60] Towards skilled population curriculum for multi-agent reinforcement learning PDF

[61] Inducing cooperation via team regret minimization based multi-agent deep reinforcement learning PDF

[62] It takes four to tango: Multiagent selfplay for automatic curriculum generation PDF

[63] Adaptive regret minimization for learning complex team-based tactics PDF

[64] Regret-minimization algorithms for multi-agent cooperative learning systems PDF

State-of-the-art performance on Overcooked benchmark with real humans

[5] Learning to cooperate with humans using generative agents PDF

[17] Reinforcement Learning for Human-AI Collaboration: Challenges, Mechanisms, and Methods PDF

[21] Tackling cooperative incompatibility for zero-shot human-ai coordination PDF

[50] Learning zero-shot cooperation with humans, assuming humans are biased PDF

[51] Implicitly Aligning Humans and Autonomous Agents through Shared Task Abstractions PDF

[52] Designs for Enabling Collaboration in Human-Machine Teaming via Interactive and Explainable Systems PDF

[53] Unsupervised partner design enables robust ad-hoc teamwork PDF

[54] Efficient human-ai coordination via preparatory language-based convention PDF

[55] Action Over Words: Predicting Human Trust in AI Partners Through Gameplay Behaviors PDF

Table of Contents