Unsupervised Learning of Efficient Exploration: Pre-training Adaptive Policies via Self-Imposed Goals

ICLR 2026 Conference SubmissionAnonymous Authors
Reinforcement LearningUnsupervised Reinforcement LearningMeta-Reinforcement LearningPre-trainingCurriculum Learning
Abstract:

Unsupervised pre-training can equip reinforcement learning agents with prior knowledge and accelerate learning in downstream tasks. A promising direction, grounded in human development, investigates agents that learn by setting and pursuing their own goals. The core challenge lies in how to effectively generate, select, and learn from such goals. Our focus is on broad distributions of downstream tasks where solving every task zero-shot is infeasible. Such settings naturally arise when the target tasks lie outside of the pre-training distribution or when their identities are unknown to the agent. In this work, we (i) optimize for efficient multi-episode exploration and adaptation within a meta-learning framework, and (ii) guide the training curriculum with evolving estimates of the agent’s post-adaptation performance. We present ULEE, an unsupervised meta-learning method that combines an in-context learner with an adversarial goal-generation strategy that maintains training at the frontier of the agent’s capabilities. On XLand-MiniGrid benchmarks, ULEE pre-training yields improved exploration and adaptation abilities that generalize to novel objectives, environment dynamics, and map structures. The resulting policy attains improved zero-shot and few-shot performance, and provides a strong initialization for longer fine-tuning processes. It outperforms learning from scratch, DIAYN pre-training, and alternative curricula.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes ULEE, an unsupervised meta-learning method combining in-context learning with adversarial goal generation to maintain training at the frontier of agent capabilities. It resides in the 'In-Context Learning with Adversarial Goal Curricula' leaf under Meta-Learning and Cross-Task Adaptation. Notably, this leaf contains only the original paper itself with no sibling papers, indicating a relatively sparse research direction within the taxonomy. The broader Meta-Learning branch includes three leaves total, suggesting this specific combination of in-context adaptation and adversarial curricula represents a less crowded niche compared to other goal-conditioned or curriculum learning approaches.

The taxonomy reveals neighboring work in adjacent branches. Visual Meta-RL Curricula explores automatic task distribution discovery from visual observations, while Cross-Embodiment methods focus on policy transfer across different agent bodies. The Goal-Conditioned and Skill Discovery branch contains related work on autonomous goal generation (Imagined Goals, Intrinsic Goal Exploration) and compositional skill learning, though these typically lack the meta-learning framework. Autonomous Curriculum Learning methods like Self-Supervised Curriculum and Automatic Curriculum share the curriculum generation theme but do not emphasize in-context adaptation. The taxonomy's scope notes clarify that adversarial curricula combined with in-context learning distinguish this work from non-adversarial or non-meta-learning approaches.

Among 19 candidates examined, two contributions show potential prior work overlap. The post-adaptation task-difficulty metric examined 10 candidates with 1 appearing refutable, suggesting some precedent exists for difficulty-based curriculum guidance. The empirical evaluation examined 9 candidates with 1 refutable, indicating the experimental setup may overlap with existing benchmarks. The core ULEE method examined 0 candidates, likely because it represents the integrated system rather than a separable component. These statistics reflect a limited semantic search scope, not exhaustive coverage, so unexamined prior work may exist beyond the top-19 matches.

Based on the limited search scope of 19 candidates, the work appears to occupy a relatively novel position combining in-context meta-learning with adversarial goal curricula. The sparse taxonomy leaf and low refutation rates suggest meaningful differentiation from examined prior work, though the analysis cannot rule out relevant papers outside the top-19 semantic matches. The difficulty metric and evaluation components show more precedent than the integrated ULEE framework itself.

Taxonomy

Core-task Taxonomy Papers
20
3
Claimed Contributions
19
Contribution Candidate Papers Compared
2
Refutable Paper

Research Landscape Overview

Core task: unsupervised pre-training of adaptive reinforcement learning policies through self-imposed goal curricula. The field addresses how agents can autonomously acquire reusable skills and representations without external task specifications, organizing itself into several complementary branches. Meta-Learning and Cross-Task Adaptation focuses on learning policies that generalize across diverse tasks, often through in-context learning or cross-embodiment transfer as seen in Peac Cross-Embodiment[1] and Visual Meta-RL Curricula[3]. Goal-Conditioned and Skill Discovery Methods emphasize discovering meaningful subgoals and skills, with works like Imagined Goals[2] and Intrinsic Goal Exploration[6] proposing mechanisms for autonomous goal generation. Autonomous Curriculum Learning without Task Knowledge develops strategies for progressively structuring learning experiences, exemplified by Self-Supervised Curriculum[4] and Automatic Curriculum[12]. Exploration and Intrinsic Motivation Strategies tackle the challenge of driving exploration through curiosity or surprise signals, while Model-Based Planning and Representation Learning investigates how world models can support goal-directed behavior. Domain Adaptation and Transfer Learning addresses cross-domain generalization, and Domain-Specific Applications with Adaptive Curricula applies these principles to specialized settings like Off-Road Path Planning[10]. A particularly active line of work centers on how agents construct and sequence their own learning objectives. Some approaches leverage adversarial or self-play mechanisms to generate challenging goal distributions, as in Self-Play Goal Embeddings[16], while others use curriculum sampling strategies like SAR Curriculum Sampling[8] or masking techniques such as Curriculum Masking[5] to control task difficulty. Self-Imposed Goals[0] sits within the Meta-Learning and Cross-Task Adaptation branch, specifically focusing on in-context learning with adversarial goal curricula. This positions it closely alongside Visual Meta-RL Curricula[3], which similarly explores curriculum-driven meta-learning, though Self-Imposed Goals[0] emphasizes adversarial goal generation to drive adaptation. Compared to Peac Cross-Embodiment[1], which targets cross-embodiment transfer, Self-Imposed Goals[0] concentrates more on the curriculum design aspect within a single embodiment context. The central tension across these branches remains balancing exploration breadth with sample efficiency, and determining how much structure to impose versus discover autonomously.

Claimed Contributions

Post-adaptation task-difficulty metric for unsupervised goal generation

The authors propose a novel metric that evaluates goal difficulty based on the agent's performance after an adaptation budget, rather than immediate performance. This metric guides the curriculum by focusing training on goals that are challenging yet achievable after adaptation, aligning better with evaluation settings where policies must adapt to novel tasks.

10 retrieved papers
Can Refute
ULEE: an unsupervised meta-learning method with adversarial goal curriculum

ULEE is a complete system that combines meta-learning with automatic curriculum generation. It trains an in-context learning policy using self-generated goals, employs an adversarial goal-search policy to propose challenging goals, uses a difficulty predictor to estimate post-adaptation performance, and samples goals at intermediate difficulty levels to maintain an effective training curriculum.

0 retrieved papers
Empirical evaluation demonstrating improved exploration, adaptation, and fine-tuning

The authors conduct comprehensive experiments on XLand-MiniGrid benchmarks showing that ULEE pre-training yields superior zero-shot exploration, few-shot adaptation, and provides strong initialization for extended fine-tuning compared to learning from scratch, DIAYN pre-training, and alternative curricula. The method generalizes to novel objectives, environment dynamics, and map structures.

9 retrieved papers
Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Post-adaptation task-difficulty metric for unsupervised goal generation

The authors propose a novel metric that evaluates goal difficulty based on the agent's performance after an adaptation budget, rather than immediate performance. This metric guides the curriculum by focusing training on goals that are challenging yet achievable after adaptation, aligning better with evaluation settings where policies must adapt to novel tasks.

Contribution

ULEE: an unsupervised meta-learning method with adversarial goal curriculum

ULEE is a complete system that combines meta-learning with automatic curriculum generation. It trains an in-context learning policy using self-generated goals, employs an adversarial goal-search policy to propose challenging goals, uses a difficulty predictor to estimate post-adaptation performance, and samples goals at intermediate difficulty levels to maintain an effective training curriculum.

Contribution

Empirical evaluation demonstrating improved exploration, adaptation, and fine-tuning

The authors conduct comprehensive experiments on XLand-MiniGrid benchmarks showing that ULEE pre-training yields superior zero-shot exploration, few-shot adaptation, and provides strong initialization for extended fine-tuning compared to learning from scratch, DIAYN pre-training, and alternative curricula. The method generalizes to novel objectives, environment dynamics, and map structures.