TD-JEPA: Latent-predictive Representations for Zero-Shot Reinforcement Learning
Overview
Overall Novelty Assessment
The paper introduces TD-JEPA, which applies temporal-difference learning to train latent-predictive representations for zero-shot reinforcement learning. It resides in the 'Self-Predictive Latent Representations' leaf, which contains five papers including the original work. This leaf sits within the broader 'Latent Dynamics Prediction and World Modeling' branch, indicating a moderately populated research direction focused on learning forward models in latent space. The sibling papers explore related themes such as compositional structure, disentanglement, and bootstrapping-based prediction, suggesting an active but not overcrowded subfield where different architectural and objective choices are still being explored.
The taxonomy reveals that TD-JEPA's leaf is adjacent to 'World Model-Based Planning and Control', which emphasizes model-predictive control rather than representation learning, and 'Reward-Free and Passive Data Learning', which focuses on learning from observational data without reward signals. The paper's emphasis on policy-conditioned multi-step prediction and zero-shot task adaptation also connects it to the 'Cross-Task and Multi-Task Generalization' branch, though it remains distinct by prioritizing latent dynamics over explicit task encoders. The taxonomy's scope and exclude notes clarify that TD-JEPA's focus on TD-based objectives differentiates it from planning-centric world models and from methods requiring reward signals during training.
Among 23 candidates examined across three contributions, none were flagged as clearly refuting the paper's claims. The first contribution (TD-based latent-predictive representations) examined three candidates with no refutations, suggesting limited prior work directly combining TD learning with policy-conditioned multi-step latent prediction. The second contribution (TD-JEPA algorithm) and third contribution (theoretical analysis) each examined ten candidates, again with no refutations. This indicates that within the limited search scope, the specific combination of TD objectives, explicit state and task encoders, and zero-shot optimization in latent space appears relatively unexplored, though the search scale is modest and may not capture all relevant prior work.
Based on the limited literature search of 23 candidates, TD-JEPA appears to occupy a distinct position within the self-predictive latent representations subfield. The absence of refutable prior work among examined candidates suggests novelty in its specific technical approach, though the search scope does not guarantee exhaustive coverage of related methods in successor features, world modeling, or unsupervised RL. The taxonomy context indicates the paper contributes to an active but not saturated research direction, where different strategies for learning predictive latent dynamics are still being actively developed and compared.
Taxonomy
Research Landscape Overview
Claimed Contributions
The authors introduce a novel temporal-difference loss for latent-predictive representation learning that models multi-step, policy-conditioned dynamics from offline data. Unlike prior methods limited to single-step prediction or on-policy data, this approach learns representations that capture long-term features relevant for value estimation across multiple policies.
The authors propose TD-JEPA, a zero-shot unsupervised RL algorithm that jointly trains state encoders, task encoders, policy-conditioned predictors, and parameterized policies end-to-end from offline reward-free transitions. The method enables zero-shot optimization of any reward function at test time entirely in latent space.
The authors provide theoretical guarantees showing that TD-JEPA with linear predictors avoids representation collapse, recovers a low-rank factorization of successor measures, and minimizes an upper bound on policy evaluation error. These results build on a novel gradient matching argument that generalizes existing analyses of latent-predictive representations.
Core Task Comparisons
Comparisons with papers in the same taxonomy category
[1] Self-Predictive Representations for Combinatorial Generalization in Behavioral Cloning PDF
[2] Data-Efficient Reinforcement Learning with Self-Predictive Representations PDF
[43] Bootstrap Latent-Predictive Representations for Multitask Reinforcement Learning PDF
[45] Disentangled Predictive Representation for Meta-Reinforcement Learning PDF
Contribution Analysis
Detailed comparisons for each claimed contribution
TD-based latent-predictive representations for multi-step, policy-conditioned dynamics
The authors introduce a novel temporal-difference loss for latent-predictive representation learning that models multi-step, policy-conditioned dynamics from offline data. Unlike prior methods limited to single-step prediction or on-policy data, this approach learns representations that capture long-term features relevant for value estimation across multiple policies.
[67] Multi-agent LLMs with Offline Reinforcement Learning for Hierarchical Multi-turn Decision-making PDF
[68] Uncertainty-driven exploration in sparse model-based reinforcement learning PDF
[69] TOPS: Transition-Based Volatility-Reduced Policy Search PDF
TD-JEPA algorithm for zero-shot unsupervised RL
The authors propose TD-JEPA, a zero-shot unsupervised RL algorithm that jointly trains state encoders, task encoders, policy-conditioned predictors, and parameterized policies end-to-end from offline reward-free transitions. The method enables zero-shot optimization of any reward function at test time entirely in latent space.
[7] Dynamics-Aligned Latent Imagination in Contextual World Models for Zero-Shot Generalization PDF
[10] Diverse Policy Learning via Random Obstacle Deployment for Zero-Shot Adaptation PDF
[29] CSLP-AE: A Contrastive Split-Latent Permutation Autoencoder Framework for Zero-Shot Electroencephalography Signal Conversion PDF
[51] Right Question is Already Half the Answer: Fully Unsupervised LLM Reasoning Incentivization PDF
[52] Learning to generalize with latent embedding optimization for few- and zero-shot cross domain fault diagnosis PDF
[53] Constrained Skill Discovery: Quadruped Locomotion with Unsupervised Reinforcement Learning PDF
[54] From Parameters to Behavior: Unsupervised Compression of the Policy Space PDF
[55] Data-driven latent space representation for robust bipedal locomotion learning PDF
[56] ZSL-RPPO: Zero-Shot Learning for Quadrupedal Locomotion in Challenging Terrains using Recurrent Proximal Policy Optimization PDF
[57] Distributional Successor Features Enable Zero-Shot Policy Optimization PDF
Theoretical analysis connecting TD-JEPA to successor features and policy evaluation
The authors provide theoretical guarantees showing that TD-JEPA with linear predictors avoids representation collapse, recovers a low-rank factorization of successor measures, and minimizes an upper bound on policy evaluation error. These results build on a novel gradient matching argument that generalizes existing analyses of latent-predictive representations.