Skill Learning via Policy Diversity Yields Identifiable Representations for Reinforcement Learning

ICLR 2026 Conference SubmissionAnonymous Authors
reinforcement learningrepresentation learningidentifiabilityICAexplorationunsupervised skill discovery
Abstract:

Self-supervised feature learning and pretraining methods in reinforcement learning (RL) often rely on information-theoretic principles, termed mutual information skill learning (MISL). These methods aim to learn a representation of the environment while also incentivizing exploration thereof. However, the role of the representation and mutual information parametrization in MISL is not yet well understood theoretically. Our work investigates MISL through the lens of identifiable representation learning by focusing on the Contrastive Successor Features (CSF) method. We prove that CSF can provably recover the environment's ground-truth features up to a linear transformation due to the inner product parametrization of the features and skill diversity in a discriminative sense. This first identifiability guarantee for representation learning in RL also helps explain the implications of different mutual information objectives and the downsides of entropy regularizers. We empirically validate our claims in MuJoCo and DeepMind Control and show how CSF provably recovers the ground-truth features both from states and pixels.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper establishes the first identifiability guarantee for representation learning in reinforcement learning by analyzing the Contrastive Successor Features (CSF) method. It occupies the 'Identifiable Representation Recovery' leaf within the 'Theoretical Foundations and Identifiability' branch of the taxonomy. Notably, this leaf contains only the original paper itself—no sibling papers are present—indicating that provable recovery guarantees for ground-truth features in mutual information skill learning represent a sparse and emerging research direction within the field.

The taxonomy reveals that the broader 'Theoretical Foundations and Identifiability' branch contains one neighboring leaf focused on information-theoretic analysis for task adaptation, which examines skill diversity and separability rather than identifiability per se. Adjacent branches address disentangled representations, practical skill discovery methods, and application-specific techniques. The paper's theoretical lens on identifiability distinguishes it from these neighboring areas: while disentanglement methods seek factorized encodings and mutual information skill discovery emphasizes empirical performance, this work provides formal recovery guarantees that bridge theory and practice.

Among thirty candidates examined through semantic search, none were found to refute any of the three core contributions. The first contribution—identifiability guarantees for CSF—examined ten candidates with zero refutable matches. Similarly, the theoretical explanation of mutual information skill learning success and the practical recommendations derived from analyzing MISL limitations each examined ten candidates without encountering overlapping prior work. This suggests that within the limited search scope, the combination of identifiability theory, CSF analysis, and practical guidance appears relatively unexplored, though the modest search scale leaves open the possibility of relevant work beyond the top-thirty semantic matches.

The analysis indicates that the paper occupies a novel position at the intersection of theoretical guarantees and mutual information skill learning, based on examination of thirty semantically related candidates. The absence of sibling papers in its taxonomy leaf and the lack of refutable prior work across all contributions suggest originality within the surveyed scope. However, the limited search scale means this assessment reflects top-ranked semantic matches rather than exhaustive coverage of the broader reinforcement learning theory literature.

Taxonomy

Core-task Taxonomy Papers
11
3
Claimed Contributions
30
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: Identifiable representation learning for reinforcement learning through mutual information skill learning. The field centers on discovering and leveraging latent skill representations that agents can learn without external supervision, often by maximizing mutual information between skills and states or trajectories. The taxonomy reveals several major branches: Theoretical Foundations and Identifiability examines when and how learned representations can be uniquely recovered; Disentangled Representation Learning seeks factorized skill encodings; Mutual Information Skill Discovery encompasses methods that directly optimize information-theoretic objectives; Skill Diversity and Coverage focuses on ensuring broad behavioral repertoires; Generalization and Meta-Learning addresses transfer across tasks; Multi-Agent Skill Discovery extends these ideas to coordinated settings; and Application-Specific Methods tailor skill learning to particular domains. Representative works such as Contrastive Intrinsic Control[4] and Coordination Skill Discovery[5] illustrate how mutual information objectives can be instantiated in single-agent and multi-agent contexts, while Disentangled Skill Discovery[2] highlights the push toward interpretable factorizations. A particularly active line of inquiry concerns the theoretical guarantees underlying skill recovery: whether learned representations correspond to true underlying factors and under what conditions identifiability holds. Policy Diversity Identifiable[0] sits squarely within the Theoretical Foundations and Identifiability branch, specifically addressing identifiable representation recovery. It contrasts with works like Conditional Mutual Information[1], which also explores information-theoretic conditions for identifiability, and Disentangled Skill Discovery[2], which emphasizes disentanglement but may not provide the same formal recovery guarantees. Meanwhile, application-oriented methods such as CrossLoco[3] and Language Conditioned Skills[8] demonstrate how skill learning scales to complex locomotion and language-grounded tasks, raising open questions about balancing theoretical rigor with practical expressiveness. The interplay between provable identifiability and empirical skill utility remains a central tension across these branches.

Claimed Contributions

First identifiability guarantee for representation learning in RL via CSF

The authors prove that Contrastive Successor Features (CSF) can recover the ground-truth states of a POMDP up to a linear transformation. This is the first identifiability result for representation learning in reinforcement learning, achieved through inner product parametrization and diverse skill-conditioned policies.

10 retrieved papers
Theoretical explanation of MISL success through identifiability lens

The authors provide a theoretical framework explaining why mutual information skill learning methods work by connecting them to identifiable representation learning theory. They show that the combination of diverse policies and inner product parametrization enables learning meaningful state representations.

10 retrieved papers
Practical recommendations from theoretical analysis of MISL limitations

The authors derive practical insights from their identifiability analysis, including quantifying policy diversity requirements, explaining why maximum-entropy policies are suboptimal for skill learning, and clarifying why feature parametrization matters in mutual information skill learning methods.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

First identifiability guarantee for representation learning in RL via CSF

The authors prove that Contrastive Successor Features (CSF) can recover the ground-truth states of a POMDP up to a linear transformation. This is the first identifiability result for representation learning in reinforcement learning, achieved through inner product parametrization and diverse skill-conditioned policies.

Contribution

Theoretical explanation of MISL success through identifiability lens

The authors provide a theoretical framework explaining why mutual information skill learning methods work by connecting them to identifiable representation learning theory. They show that the combination of diverse policies and inner product parametrization enables learning meaningful state representations.

Contribution

Practical recommendations from theoretical analysis of MISL limitations

The authors derive practical insights from their identifiability analysis, including quantifying policy diversity requirements, explaining why maximum-entropy policies are suboptimal for skill learning, and clarifying why feature parametrization matters in mutual information skill learning methods.

Skill Learning via Policy Diversity Yields Identifiable Representations for Reinforcement Learning | Novelty Validation