Skill Learning via Policy Diversity Yields Identifiable Representations for Reinforcement Learning

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

reinforcement learningrepresentation learningidentifiabilityICAexplorationunsupervised skill discovery

Self-supervised feature learning and pretraining methods in reinforcement learning (RL) often rely on information-theoretic principles, termed mutual information skill learning (MISL). These methods aim to learn a representation of the environment while also incentivizing exploration thereof. However, the role of the representation and mutual information parametrization in MISL is not yet well understood theoretically. Our work investigates MISL through the lens of identifiable representation learning by focusing on the Contrastive Successor Features (CSF) method. We prove that CSF can provably recover the environment's ground-truth features up to a linear transformation due to the inner product parametrization of the features and skill diversity in a discriminative sense. This first identifiability guarantee for representation learning in RL also helps explain the implications of different mutual information objectives and the downsides of entropy regularizers. We empirically validate our claims in MuJoCo and DeepMind Control and show how CSF provably recovers the ground-truth features both from states and pixels.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper establishes the first identifiability guarantee for representation learning in reinforcement learning by analyzing the Contrastive Successor Features (CSF) method. It occupies the 'Identifiable Representation Recovery' leaf within the 'Theoretical Foundations and Identifiability' branch of the taxonomy. Notably, this leaf contains only the original paper itself—no sibling papers are present—indicating that provable recovery guarantees for ground-truth features in mutual information skill learning represent a sparse and emerging research direction within the field.

The taxonomy reveals that the broader 'Theoretical Foundations and Identifiability' branch contains one neighboring leaf focused on information-theoretic analysis for task adaptation, which examines skill diversity and separability rather than identifiability per se. Adjacent branches address disentangled representations, practical skill discovery methods, and application-specific techniques. The paper's theoretical lens on identifiability distinguishes it from these neighboring areas: while disentanglement methods seek factorized encodings and mutual information skill discovery emphasizes empirical performance, this work provides formal recovery guarantees that bridge theory and practice.

Among thirty candidates examined through semantic search, none were found to refute any of the three core contributions. The first contribution—identifiability guarantees for CSF—examined ten candidates with zero refutable matches. Similarly, the theoretical explanation of mutual information skill learning success and the practical recommendations derived from analyzing MISL limitations each examined ten candidates without encountering overlapping prior work. This suggests that within the limited search scope, the combination of identifiability theory, CSF analysis, and practical guidance appears relatively unexplored, though the modest search scale leaves open the possibility of relevant work beyond the top-thirty semantic matches.

The analysis indicates that the paper occupies a novel position at the intersection of theoretical guarantees and mutual information skill learning, based on examination of thirty semantically related candidates. The absence of sibling papers in its taxonomy leaf and the lack of refutable prior work across all contributions suggest originality within the surveyed scope. However, the limited search scale means this assessment reflects top-ranked semantic matches rather than exhaustive coverage of the broader reinforcement learning theory literature.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Identifiable representation learning for reinforcement learning through mutual information skill learning. The field centers on discovering and leveraging latent skill representations that agents can learn without external supervision, often by maximizing mutual information between skills and states or trajectories. The taxonomy reveals several major branches: Theoretical Foundations and Identifiability examines when and how learned representations can be uniquely recovered; Disentangled Representation Learning seeks factorized skill encodings; Mutual Information Skill Discovery encompasses methods that directly optimize information-theoretic objectives; Skill Diversity and Coverage focuses on ensuring broad behavioral repertoires; Generalization and Meta-Learning addresses transfer across tasks; Multi-Agent Skill Discovery extends these ideas to coordinated settings; and Application-Specific Methods tailor skill learning to particular domains. Representative works such as Contrastive Intrinsic Control[4] and Coordination Skill Discovery[5] illustrate how mutual information objectives can be instantiated in single-agent and multi-agent contexts, while Disentangled Skill Discovery[2] highlights the push toward interpretable factorizations. A particularly active line of inquiry concerns the theoretical guarantees underlying skill recovery: whether learned representations correspond to true underlying factors and under what conditions identifiability holds. Policy Diversity Identifiable[0] sits squarely within the Theoretical Foundations and Identifiability branch, specifically addressing identifiable representation recovery. It contrasts with works like Conditional Mutual Information[1], which also explores information-theoretic conditions for identifiability, and Disentangled Skill Discovery[2], which emphasizes disentanglement but may not provide the same formal recovery guarantees. Meanwhile, application-oriented methods such as CrossLoco[3] and Language Conditioned Skills[8] demonstrate how skill learning scales to complex locomotion and language-grounded tasks, raising open questions about balancing theoretical rigor with practical expressiveness. The interplay between provable identifiability and empirical skill utility remains a central tension across these branches.

Claimed Contributions

First identifiability guarantee for representation learning in RL via CSF

10 retrieved papers

The authors prove that Contrastive Successor Features (CSF) can recover the ground-truth states of a POMDP up to a linear transformation. This is the first identifiability result for representation learning in reinforcement learning, achieved through inner product parametrization and diverse skill-conditioned policies.

10 retrieved papers

Theoretical explanation of MISL success through identifiability lens

10 retrieved papers

The authors provide a theoretical framework explaining why mutual information skill learning methods work by connecting them to identifiable representation learning theory. They show that the combination of diverse policies and inner product parametrization enables learning meaningful state representations.

10 retrieved papers

Practical recommendations from theoretical analysis of MISL limitations

10 retrieved papers

The authors derive practical insights from their identifiability analysis, including quantifying policy diversity requirements, explaining why maximum-entropy policies are suboptimal for skill learning, and clarifying why feature parametrization matters in mutual information skill learning methods.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

First identifiability guarantee for representation learning in RL via CSF

[22] Provable benefit of multitask representation learning in reinforcement learning PDF

Cannot Refute

[23] Identifiability in inverse reinforcement learning PDF

Cannot Refute

[24] Bootstrapped Representations in Reinforcement Learning PDF

Cannot Refute

[25] Provable Benefits of Representational Transfer in Reinforcement Learning PDF

Cannot Refute

[26] Partial Identifiability and Misspecification in Inverse Reinforcement Learning PDF

Cannot Refute

[27] Is a good representation sufficient for sample efficient reinforcement learning? PDF

Cannot Refute

[28] Learning tree interpretation from object representation for deep reinforcement learning PDF

Cannot Refute

[29] Kinematic State Abstraction and Provably Efficient Rich-Observation Reinforcement Learning PDF

Cannot Refute

[30] Identifying latent state-transition processes for individualized reinforcement learning PDF

Cannot Refute

[31] Provably learning object-centric representations PDF

Cannot Refute

Contribution

Theoretical explanation of MISL success through identifiability lens

[4] Unsupervised reinforcement learning with contrastive intrinsic control PDF

Cannot Refute

[8] Rethinking Mutual Information for Language Conditioned Skill Discovery on Imitation Learning PDF

Cannot Refute

[28] Learning tree interpretation from object representation for deep reinforcement learning PDF

Cannot Refute

[32] Modeling Unseen Environments with Language-guided Composable Causal Components in Reinforcement Learning PDF

Cannot Refute

[33] Enhanced Universal Sequence Representation Learning for Recommender Systems PDF

Cannot Refute

[34] Wasserstein Unsupervised Reinforcement Learning PDF

Cannot Refute

[35] On Causally Disentangled State Representation Learning for Reinforcement Learning based Recommender Systems PDF

Cannot Refute

[36] Contrastive intrinsic control for unsupervised reinforcement learning PDF

Cannot Refute

[37] Mutual Information Constrained Variational Framework for Identifiable Representation Disentangling PDF

Cannot Refute

[38] UpSkill: Mutual Information Skill Learning for Structured Response Diversity in LLMs PDF

Cannot Refute

Contribution

Practical recommendations from theoretical analysis of MISL limitations

[12] Multi-View Spatial Context and State Constraints for Object-Goal Navigation PDF

Cannot Refute

[13] Skill-conditioned policy optimization with successor features representations PDF

Cannot Refute

[14] Learning skills diverse in value-relevant features PDF

Cannot Refute

[15] Residual skill policies: Learning an adaptable skill-based action space for reinforcement learning for robotics PDF

Cannot Refute

[16] Discovering Diverse Nearly Optimal Policies withSuccessor Features PDF

Cannot Refute

[17] Learning Parametric Closed-Loop Policies for Markov Potential Games PDF

Cannot Refute

[18] Semi-parametric efficient policy learning with continuous actions PDF

Cannot Refute

[19] FedDEK: Federated Domain-Incremental Learning via Expert Knowledge Construction PDF

Cannot Refute

[20] Skill acquisition via transfer learning and advice taking PDF

Cannot Refute

[21] Divide, Discover, Deploy: Factorized Skill Learning with Symmetry and Style Priors PDF

Cannot Refute

Skill Learning via Policy Diversity Yields Identifiable Representations for Reinforcement Learning

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

Contribution Analysis

First identifiability guarantee for representation learning in RL via CSF

[22] Provable benefit of multitask representation learning in reinforcement learning PDF

[23] Identifiability in inverse reinforcement learning PDF

[24] Bootstrapped Representations in Reinforcement Learning PDF

[25] Provable Benefits of Representational Transfer in Reinforcement Learning PDF

[26] Partial Identifiability and Misspecification in Inverse Reinforcement Learning PDF

[27] Is a good representation sufficient for sample efficient reinforcement learning? PDF

[28] Learning tree interpretation from object representation for deep reinforcement learning PDF

[29] Kinematic State Abstraction and Provably Efficient Rich-Observation Reinforcement Learning PDF

[30] Identifying latent state-transition processes for individualized reinforcement learning PDF

[31] Provably learning object-centric representations PDF

Theoretical explanation of MISL success through identifiability lens

[4] Unsupervised reinforcement learning with contrastive intrinsic control PDF

[8] Rethinking Mutual Information for Language Conditioned Skill Discovery on Imitation Learning PDF

[28] Learning tree interpretation from object representation for deep reinforcement learning PDF

[32] Modeling Unseen Environments with Language-guided Composable Causal Components in Reinforcement Learning PDF

[33] Enhanced Universal Sequence Representation Learning for Recommender Systems PDF

[34] Wasserstein Unsupervised Reinforcement Learning PDF

[35] On Causally Disentangled State Representation Learning for Reinforcement Learning based Recommender Systems PDF

[36] Contrastive intrinsic control for unsupervised reinforcement learning PDF

[37] Mutual Information Constrained Variational Framework for Identifiable Representation Disentangling PDF

[38] UpSkill: Mutual Information Skill Learning for Structured Response Diversity in LLMs PDF

Practical recommendations from theoretical analysis of MISL limitations

[12] Multi-View Spatial Context and State Constraints for Object-Goal Navigation PDF

[13] Skill-conditioned policy optimization with successor features representations PDF

[14] Learning skills diverse in value-relevant features PDF

[15] Residual skill policies: Learning an adaptable skill-based action space for reinforcement learning for robotics PDF

[16] Discovering Diverse Nearly Optimal Policies withSuccessor Features PDF

[17] Learning Parametric Closed-Loop Policies for Markov Potential Games PDF

[18] Semi-parametric efficient policy learning with continuous actions PDF

[19] FedDEK: Federated Domain-Incremental Learning via Expert Knowledge Construction PDF

[20] Skill acquisition via transfer learning and advice taking PDF

[21] Divide, Discover, Deploy: Factorized Skill Learning with Symmetry and Style Priors PDF

Table of Contents