TD-JEPA: Latent-predictive Representations for Zero-Shot Reinforcement Learning

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 7.5 Download Report PDF

zero-shot reinforcement learningunsupervised reinforcement learningself-predictive representationsjoint embedding predictive architecture

Latent prediction–where agents learn by predicting their own latents–has emerged as a powerful paradigm for training general representations in machine learning. In reinforcement learning (RL), this approach has been explored to define auxiliary losses for a variety of settings, including reward-based and unsupervised RL, behavior cloning, and world modeling. While existing methods are typically limited to single-task learning, one-step prediction, or on-policy trajectory data, we show that temporal difference (TD) learning enables learning representations predictive of long-term latent dynamics across multiple policies from offline, reward-free transitions. Building on this, we introduce TD-JEPA, which leverages TD-based latent-predictive representations into unsupervised RL. TD-JEPA trains explicit state and task encoders, a policy-conditioned multi-step predictor, and a set of parameterized policies directly in latent space. This enables zero-shot optimization of any reward function at test time. Theoretically, we show that an idealized variant of TD-JEPA avoids collapse with proper initialization, and learns encoders that capture a low-rank factorization of long-term policy dynamics, while the predictor recovers their successor features in latent space. Empirically, TD-JEPA matches or outperforms state-of-the-art baselines on locomotion, navigation, and manipulation tasks across 13 datasets in ExoRL and OGBench, especially in the challenging setting of zero-shot RL from pixels.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces TD-JEPA, which applies temporal-difference learning to train latent-predictive representations for zero-shot reinforcement learning. It resides in the 'Self-Predictive Latent Representations' leaf, which contains five papers including the original work. This leaf sits within the broader 'Latent Dynamics Prediction and World Modeling' branch, indicating a moderately populated research direction focused on learning forward models in latent space. The sibling papers explore related themes such as compositional structure, disentanglement, and bootstrapping-based prediction, suggesting an active but not overcrowded subfield where different architectural and objective choices are still being explored.

The taxonomy reveals that TD-JEPA's leaf is adjacent to 'World Model-Based Planning and Control', which emphasizes model-predictive control rather than representation learning, and 'Reward-Free and Passive Data Learning', which focuses on learning from observational data without reward signals. The paper's emphasis on policy-conditioned multi-step prediction and zero-shot task adaptation also connects it to the 'Cross-Task and Multi-Task Generalization' branch, though it remains distinct by prioritizing latent dynamics over explicit task encoders. The taxonomy's scope and exclude notes clarify that TD-JEPA's focus on TD-based objectives differentiates it from planning-centric world models and from methods requiring reward signals during training.

Among 23 candidates examined across three contributions, none were flagged as clearly refuting the paper's claims. The first contribution (TD-based latent-predictive representations) examined three candidates with no refutations, suggesting limited prior work directly combining TD learning with policy-conditioned multi-step latent prediction. The second contribution (TD-JEPA algorithm) and third contribution (theoretical analysis) each examined ten candidates, again with no refutations. This indicates that within the limited search scope, the specific combination of TD objectives, explicit state and task encoders, and zero-shot optimization in latent space appears relatively unexplored, though the search scale is modest and may not capture all relevant prior work.

Based on the limited literature search of 23 candidates, TD-JEPA appears to occupy a distinct position within the self-predictive latent representations subfield. The absence of refutable prior work among examined candidates suggests novelty in its specific technical approach, though the search scope does not guarantee exhaustive coverage of related methods in successor features, world modeling, or unsupervised RL. The taxonomy context indicates the paper contributes to an active but not saturated research direction, where different strategies for learning predictive latent dynamics are still being actively developed and compared.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: learning latent-predictive representations for zero-shot reinforcement learning. This field centers on building compact latent encodings that capture predictive structure in sequential decision problems, enabling agents to generalize to novel tasks or environments without task-specific fine-tuning. The taxonomy reveals several complementary research directions. Latent Dynamics Prediction and World Modeling focuses on learning forward models that simulate future states in latent space, often through self-predictive objectives as seen in Self-Predictive Representations[2] and Bootstrap Latent-Predictive[43]. Generalization and Transfer Across Tasks and Domains addresses how learned representations can be reused across different problem instances, while Reward-Predictive and Task-Conditioned Representations emphasize encoding goal-relevant information. Unsupervised and Self-Supervised Representation Learning explores methods like Contrastive Predictive Coding[19] that extract structure without explicit reward signals. Cross-Modal and Multi-Modal Representation Learning tackles scenarios where agents must integrate diverse sensory inputs, and Uncertainty Quantification and Robustness examines how to handle distributional shift and model confidence. Within the world modeling branch, a handful of works explore different strategies for self-prediction in latent space. Self-Predictive Combinatorial[1] investigates compositional structure, while Disentangled Predictive[45] aims to separate independent factors of variation. TD-JEPA[0] sits naturally in this cluster, emphasizing temporal-difference style objectives for learning predictive embeddings that support zero-shot transfer. Compared to Bootstrap Latent-Predictive[43], which relies on bootstrapping target networks, TD-JEPA[0] integrates temporal-difference learning more directly into the representation objective. Meanwhile, Regularized Latent Dynamics[5] highlights the importance of regularization to prevent overfitting in learned world models. These contrasting approaches reflect ongoing questions about how best to balance predictive accuracy, computational efficiency, and generalization: whether to prioritize disentanglement, compositional reasoning, or robust temporal consistency when building latent representations for zero-shot RL.

Claimed Contributions

TD-based latent-predictive representations for multi-step, policy-conditioned dynamics

3 retrieved papers

The authors introduce a novel temporal-difference loss for latent-predictive representation learning that models multi-step, policy-conditioned dynamics from offline data. Unlike prior methods limited to single-step prediction or on-policy data, this approach learns representations that capture long-term features relevant for value estimation across multiple policies.

3 retrieved papers

TD-JEPA algorithm for zero-shot unsupervised RL

10 retrieved papers

The authors propose TD-JEPA, a zero-shot unsupervised RL algorithm that jointly trains state encoders, task encoders, policy-conditioned predictors, and parameterized policies end-to-end from offline reward-free transitions. The method enables zero-shot optimization of any reward function at test time entirely in latent space.

10 retrieved papers

Theoretical analysis connecting TD-JEPA to successor features and policy evaluation

10 retrieved papers

The authors provide theoretical guarantees showing that TD-JEPA with linear predictors avoids representation collapse, recovers a low-rank factorization of successor measures, and minimizes an upper bound on policy evaluation error. These results build on a novel gradient matching argument that generalizes existing analyses of latent-predictive representations.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[1] Self-Predictive Representations for Combinatorial Generalization in Behavioral Cloning PDF

Lawson Daniel, Berseth, Glen, Khetarpal, Khimya (2025)

[2] Data-Efficient Reinforcement Learning with Self-Predictive Representations PDF

Schwarzer, Max (2020) • International Conference on Learning Representations

[43] Bootstrap Latent-Predictive Representations for Multitask Reinforcement Learning PDF

Zhaohan Daniel Guo, Bernardo Avila Pires, Bernardo Ãvila Pires, Mohammad Gheshlaghi Azar, Bilal Piot, Florent AltchÃ©, Jean-Bastien Grill, RÃ©mi Munos (2022) • arXiv (Cornell University)

[45] Disentangled Predictive Representation for Meta-Reinforcement Learning PDF

Sephora Madjiheurem, Laura Toni (2021) • International Conference on Machine Learning

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

TD-based latent-predictive representations for multi-step, policy-conditioned dynamics

[67] Multi-agent LLMs with Offline Reinforcement Learning for Hierarchical Multi-turn Decision-making PDF

Cannot Refute

[68] Uncertainty-driven exploration in sparse model-based reinforcement learning PDF

Cannot Refute

[69] TOPS: Transition-Based Volatility-Reduced Policy Search PDF

Cannot Refute

Contribution

TD-JEPA algorithm for zero-shot unsupervised RL

[7] Dynamics-Aligned Latent Imagination in Contextual World Models for Zero-Shot Generalization PDF

Cannot Refute

[10] Diverse Policy Learning via Random Obstacle Deployment for Zero-Shot Adaptation PDF

Cannot Refute

[29] CSLP-AE: A Contrastive Split-Latent Permutation Autoencoder Framework for Zero-Shot Electroencephalography Signal Conversion PDF

Cannot Refute

[51] Right Question is Already Half the Answer: Fully Unsupervised LLM Reasoning Incentivization PDF

Cannot Refute

[52] Learning to generalize with latent embedding optimization for few- and zero-shot cross domain fault diagnosis PDF

Cannot Refute

[53] Constrained Skill Discovery: Quadruped Locomotion with Unsupervised Reinforcement Learning PDF

Cannot Refute

[54] From Parameters to Behavior: Unsupervised Compression of the Policy Space PDF

Cannot Refute

[55] Data-driven latent space representation for robust bipedal locomotion learning PDF

Cannot Refute

[56] ZSL-RPPO: Zero-Shot Learning for Quadrupedal Locomotion in Challenging Terrains using Recurrent Proximal Policy Optimization PDF

Cannot Refute

[57] Distributional Successor Features Enable Zero-Shot Policy Optimization PDF

Cannot Refute

Contribution

Theoretical analysis connecting TD-JEPA to successor features and policy evaluation

[39] Successor Clusters: A Behavior Basis for Unsupervised Zero-Shot Reinforcement Learning PDF

Cannot Refute

[58] Learning structures: predictive representations, replay, and generalization PDF

Cannot Refute

[59] A neurally plausible model learns successor representations in partially observable environments PDF

Cannot Refute

[60] A New Representation of Universal Successor Features for Enhancing the Generalization of Target-Driven Visual Navigation PDF

Cannot Refute

[61] Learning Successor Feature Representations to Train Robust Policies for Multi-task Learning PDF

Cannot Refute

[62] Accounting for sensitivity of latent learning to behavioral statistics with successor representations PDF

Cannot Refute

[63] Policy-Oriented Cognitive Risk Map Modeling for Lane Change via Deep Successor Representation PDF

Cannot Refute

[64] Eigenoption discovery through the deep successor representation PDF

Cannot Refute

[65] Combining Behaviors with the Successor Features Keyboard PDF

Cannot Refute

[66] Accelerating learning in constructive predictive frameworks with the successor representation PDF

Cannot Refute

TD-JEPA: Latent-predictive Representations for Zero-Shot Reinforcement Learning

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[1] Self-Predictive Representations for Combinatorial Generalization in Behavioral Cloning PDF

[2] Data-Efficient Reinforcement Learning with Self-Predictive Representations PDF

[43] Bootstrap Latent-Predictive Representations for Multitask Reinforcement Learning PDF

[45] Disentangled Predictive Representation for Meta-Reinforcement Learning PDF

Contribution Analysis

TD-based latent-predictive representations for multi-step, policy-conditioned dynamics

[67] Multi-agent LLMs with Offline Reinforcement Learning for Hierarchical Multi-turn Decision-making PDF

[68] Uncertainty-driven exploration in sparse model-based reinforcement learning PDF

[69] TOPS: Transition-Based Volatility-Reduced Policy Search PDF

TD-JEPA algorithm for zero-shot unsupervised RL

[7] Dynamics-Aligned Latent Imagination in Contextual World Models for Zero-Shot Generalization PDF

[10] Diverse Policy Learning via Random Obstacle Deployment for Zero-Shot Adaptation PDF

[29] CSLP-AE: A Contrastive Split-Latent Permutation Autoencoder Framework for Zero-Shot Electroencephalography Signal Conversion PDF

[51] Right Question is Already Half the Answer: Fully Unsupervised LLM Reasoning Incentivization PDF

[52] Learning to generalize with latent embedding optimization for few- and zero-shot cross domain fault diagnosis PDF

[53] Constrained Skill Discovery: Quadruped Locomotion with Unsupervised Reinforcement Learning PDF

[54] From Parameters to Behavior: Unsupervised Compression of the Policy Space PDF

[55] Data-driven latent space representation for robust bipedal locomotion learning PDF

[56] ZSL-RPPO: Zero-Shot Learning for Quadrupedal Locomotion in Challenging Terrains using Recurrent Proximal Policy Optimization PDF

[57] Distributional Successor Features Enable Zero-Shot Policy Optimization PDF

Theoretical analysis connecting TD-JEPA to successor features and policy evaluation

[39] Successor Clusters: A Behavior Basis for Unsupervised Zero-Shot Reinforcement Learning PDF

[58] Learning structures: predictive representations, replay, and generalization PDF

[59] A neurally plausible model learns successor representations in partially observable environments PDF

[60] A New Representation of Universal Successor Features for Enhancing the Generalization of Target-Driven Visual Navigation PDF

[61] Learning Successor Feature Representations to Train Robust Policies for Multi-task Learning PDF

[62] Accounting for sensitivity of latent learning to behavioral statistics with successor representations PDF

[63] Policy-Oriented Cognitive Risk Map Modeling for Lane Change via Deep Successor Representation PDF

[64] Eigenoption discovery through the deep successor representation PDF

[65] Combining Behaviors with the Successor Features Keyboard PDF

[66] Accelerating learning in constructive predictive frameworks with the successor representation PDF

Table of Contents