Unsupervised Learning of Efficient Exploration: Pre-training Adaptive Policies via Self-Imposed Goals

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

Reinforcement LearningUnsupervised Reinforcement LearningMeta-Reinforcement LearningPre-trainingCurriculum Learning

Unsupervised pre-training can equip reinforcement learning agents with prior knowledge and accelerate learning in downstream tasks. A promising direction, grounded in human development, investigates agents that learn by setting and pursuing their own goals. The core challenge lies in how to effectively generate, select, and learn from such goals. Our focus is on broad distributions of downstream tasks where solving every task zero-shot is infeasible. Such settings naturally arise when the target tasks lie outside of the pre-training distribution or when their identities are unknown to the agent. In this work, we (i) optimize for efficient multi-episode exploration and adaptation within a meta-learning framework, and (ii) guide the training curriculum with evolving estimates of the agent’s post-adaptation performance. We present ULEE, an unsupervised meta-learning method that combines an in-context learner with an adversarial goal-generation strategy that maintains training at the frontier of the agent’s capabilities. On XLand-MiniGrid benchmarks, ULEE pre-training yields improved exploration and adaptation abilities that generalize to novel objectives, environment dynamics, and map structures. The resulting policy attains improved zero-shot and few-shot performance, and provides a strong initialization for longer fine-tuning processes. It outperforms learning from scratch, DIAYN pre-training, and alternative curricula.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes ULEE, an unsupervised meta-learning method combining in-context learning with adversarial goal generation to maintain training at the frontier of agent capabilities. It resides in the 'In-Context Learning with Adversarial Goal Curricula' leaf under Meta-Learning and Cross-Task Adaptation. Notably, this leaf contains only the original paper itself with no sibling papers, indicating a relatively sparse research direction within the taxonomy. The broader Meta-Learning branch includes three leaves total, suggesting this specific combination of in-context adaptation and adversarial curricula represents a less crowded niche compared to other goal-conditioned or curriculum learning approaches.

The taxonomy reveals neighboring work in adjacent branches. Visual Meta-RL Curricula explores automatic task distribution discovery from visual observations, while Cross-Embodiment methods focus on policy transfer across different agent bodies. The Goal-Conditioned and Skill Discovery branch contains related work on autonomous goal generation (Imagined Goals, Intrinsic Goal Exploration) and compositional skill learning, though these typically lack the meta-learning framework. Autonomous Curriculum Learning methods like Self-Supervised Curriculum and Automatic Curriculum share the curriculum generation theme but do not emphasize in-context adaptation. The taxonomy's scope notes clarify that adversarial curricula combined with in-context learning distinguish this work from non-adversarial or non-meta-learning approaches.

Among 19 candidates examined, two contributions show potential prior work overlap. The post-adaptation task-difficulty metric examined 10 candidates with 1 appearing refutable, suggesting some precedent exists for difficulty-based curriculum guidance. The empirical evaluation examined 9 candidates with 1 refutable, indicating the experimental setup may overlap with existing benchmarks. The core ULEE method examined 0 candidates, likely because it represents the integrated system rather than a separable component. These statistics reflect a limited semantic search scope, not exhaustive coverage, so unexamined prior work may exist beyond the top-19 matches.

Based on the limited search scope of 19 candidates, the work appears to occupy a relatively novel position combining in-context meta-learning with adversarial goal curricula. The sparse taxonomy leaf and low refutation rates suggest meaningful differentiation from examined prior work, though the analysis cannot rule out relevant papers outside the top-19 semantic matches. The difficulty metric and evaluation components show more precedent than the integrated ULEE framework itself.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: unsupervised pre-training of adaptive reinforcement learning policies through self-imposed goal curricula. The field addresses how agents can autonomously acquire reusable skills and representations without external task specifications, organizing itself into several complementary branches. Meta-Learning and Cross-Task Adaptation focuses on learning policies that generalize across diverse tasks, often through in-context learning or cross-embodiment transfer as seen in Peac Cross-Embodiment[1] and Visual Meta-RL Curricula[3]. Goal-Conditioned and Skill Discovery Methods emphasize discovering meaningful subgoals and skills, with works like Imagined Goals[2] and Intrinsic Goal Exploration[6] proposing mechanisms for autonomous goal generation. Autonomous Curriculum Learning without Task Knowledge develops strategies for progressively structuring learning experiences, exemplified by Self-Supervised Curriculum[4] and Automatic Curriculum[12]. Exploration and Intrinsic Motivation Strategies tackle the challenge of driving exploration through curiosity or surprise signals, while Model-Based Planning and Representation Learning investigates how world models can support goal-directed behavior. Domain Adaptation and Transfer Learning addresses cross-domain generalization, and Domain-Specific Applications with Adaptive Curricula applies these principles to specialized settings like Off-Road Path Planning[10]. A particularly active line of work centers on how agents construct and sequence their own learning objectives. Some approaches leverage adversarial or self-play mechanisms to generate challenging goal distributions, as in Self-Play Goal Embeddings[16], while others use curriculum sampling strategies like SAR Curriculum Sampling[8] or masking techniques such as Curriculum Masking[5] to control task difficulty. Self-Imposed Goals[0] sits within the Meta-Learning and Cross-Task Adaptation branch, specifically focusing on in-context learning with adversarial goal curricula. This positions it closely alongside Visual Meta-RL Curricula[3], which similarly explores curriculum-driven meta-learning, though Self-Imposed Goals[0] emphasizes adversarial goal generation to drive adaptation. Compared to Peac Cross-Embodiment[1], which targets cross-embodiment transfer, Self-Imposed Goals[0] concentrates more on the curriculum design aspect within a single embodiment context. The central tension across these branches remains balancing exploration breadth with sample efficiency, and determining how much structure to impose versus discover autonomously.

Claimed Contributions

Post-adaptation task-difficulty metric for unsupervised goal generation

Can Refute

10 retrieved papers

The authors propose a novel metric that evaluates goal difficulty based on the agent's performance after an adaptation budget, rather than immediate performance. This metric guides the curriculum by focusing training on goals that are challenging yet achievable after adaptation, aligning better with evaluation settings where policies must adapt to novel tasks.

10 retrieved papers

Can Refute

ULEE: an unsupervised meta-learning method with adversarial goal curriculum

0 retrieved papers

ULEE is a complete system that combines meta-learning with automatic curriculum generation. It trains an in-context learning policy using self-generated goals, employs an adversarial goal-search policy to propose challenging goals, uses a difficulty predictor to estimate post-adaptation performance, and samples goals at intermediate difficulty levels to maintain an effective training curriculum.

0 retrieved papers

Empirical evaluation demonstrating improved exploration, adaptation, and fine-tuning

Can Refute

9 retrieved papers

The authors conduct comprehensive experiments on XLand-MiniGrid benchmarks showing that ULEE pre-training yields superior zero-shot exploration, few-shot adaptation, and provides strong initialization for extended fine-tuning compared to learning from scratch, DIAYN pre-training, and alternative curricula. The method generalizes to novel objectives, environment dynamics, and map structures.

9 retrieved papers

Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Post-adaptation task-difficulty metric for unsupervised goal generation

[30] Automatic Goal Generation for Reinforcement Learning Agents PDF

Can Refute

[31] Latent learning progress drives autonomous goal selection in human reinforcement learning PDF

Cannot Refute

[32] Curious: intrinsically motivated modular multi-goal reinforcement learning PDF

Cannot Refute

[33] Automated Curriculum Learning for Neural Networks PDF

Cannot Refute

[34] Genet: automatic curriculum generation for learning adaptation in networking PDF

Cannot Refute

[35] Goal-Directed Story Generation: Augmenting Generative Language Models with Reinforcement Learning PDF

Cannot Refute

[36] Task-oriented dialog policy learning via deep reinforcement learning and automatic graph neural network curriculum learning PDF

Cannot Refute

[37] SGN-CIRL: Scene Graph-based Navigation with Curriculum, Imitation, and Reinforcement Learning PDF

Cannot Refute

[38] Automatic curriculum learning through value disagreement PDF

Cannot Refute

[39] Personalized task difficulty adaptation based on reinforcement learning PDF

Cannot Refute

Contribution

ULEE: an unsupervised meta-learning method with adversarial goal curriculum

Contribution

Empirical evaluation demonstrating improved exploration, adaptation, and fine-tuning

[21] Human-timescale adaptation in an open-ended task space PDF

Can Refute

[22] Hypothesis Network Planned Exploration for Rapid Meta-Reinforcement Learning Adaptation PDF

Cannot Refute

[23] Data-efficient task generalization via probabilistic model-based meta reinforcement learning PDF

Cannot Refute

[24] Meta Dynamic Pricing: Transfer Learning Across Experiments PDF

Cannot Refute

[25] A lazy approach to long-horizon gradient-based meta-learning PDF

Cannot Refute

[26] Generalization in LLM Reasoning: A Meta-Learned Approach to Optimal Imitation and Exploration PDF

Cannot Refute

[27] Cross-Domain Reasoning Transfer in LLMs via Meta-Learned Imitation-Exploration Policies PDF

Cannot Refute

[28] Exploring Strategies for Personalized Radiation Therapy: Part III Identifying genetic determinants for Radiation Response with Meta Learning PDF

Cannot Refute

[29] Meta-Bayesian Active Explorer: Hierarchical bayesian meta-learning for sample-efficient manufacturing optimization with similarity-weighted transfer PDF

Cannot Refute

Unsupervised Learning of Efficient Exploration: Pre-training Adaptive Policies via Self-Imposed Goals

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

Contribution Analysis

Post-adaptation task-difficulty metric for unsupervised goal generation

[30] Automatic Goal Generation for Reinforcement Learning Agents PDF

[31] Latent learning progress drives autonomous goal selection in human reinforcement learning PDF

[32] Curious: intrinsically motivated modular multi-goal reinforcement learning PDF

[33] Automated Curriculum Learning for Neural Networks PDF

[34] Genet: automatic curriculum generation for learning adaptation in networking PDF

[35] Goal-Directed Story Generation: Augmenting Generative Language Models with Reinforcement Learning PDF

[36] Task-oriented dialog policy learning via deep reinforcement learning and automatic graph neural network curriculum learning PDF

[37] SGN-CIRL: Scene Graph-based Navigation with Curriculum, Imitation, and Reinforcement Learning PDF

[38] Automatic curriculum learning through value disagreement PDF

[39] Personalized task difficulty adaptation based on reinforcement learning PDF

ULEE: an unsupervised meta-learning method with adversarial goal curriculum

Empirical evaluation demonstrating improved exploration, adaptation, and fine-tuning

[21] Human-timescale adaptation in an open-ended task space PDF

[22] Hypothesis Network Planned Exploration for Rapid Meta-Reinforcement Learning Adaptation PDF

[23] Data-efficient task generalization via probabilistic model-based meta reinforcement learning PDF

[24] Meta Dynamic Pricing: Transfer Learning Across Experiments PDF

[25] A lazy approach to long-horizon gradient-based meta-learning PDF

[26] Generalization in LLM Reasoning: A Meta-Learned Approach to Optimal Imitation and Exploration PDF

[27] Cross-Domain Reasoning Transfer in LLMs via Meta-Learned Imitation-Exploration Policies PDF

[28] Exploring Strategies for Personalized Radiation Therapy: Part III Identifying genetic determinants for Radiation Response with Meta Learning PDF

[29] Meta-Bayesian Active Explorer: Hierarchical bayesian meta-learning for sample-efficient manufacturing optimization with similarity-weighted transfer PDF

Table of Contents