Discovering Diverse Behaviors via Temporal Contrastive Learning

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.5 Download Report PDF

reinforcement learningexplorationintrinsic motivationsurpriseempowermentcontrastive learning

Effective exploration in reinforcement learning requires not only tracking where an agent has been, but also understanding how the agent perceives and represents the world. To learn powerful representations, an agent should actively explore states that contribute to its knowledge of the environment. Temporal representations can capture the information necessary to solve a wide range of potential tasks while avoiding the computational cost associated with full state reconstruction. In this paper, we propose an exploration method that leverages temporal contrastive representations to guide exploration, prioritizing states with unpredictable future outcomes. We demonstrate that such representations can enable the learning of complex exploratory behaviors in locomotion, manipulation, and embodied-AI tasks, revealing capabilities and behaviors that traditionally require extrinsic rewards. Unlike approaches that rely on explicit distance learning or episodic memory mechanisms (e.g., quasimetric-based methods), our method builds directly on temporal similarities, yielding a simpler yet effective strategy for exploration.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes an exploration method that uses temporal contrastive representations to prioritize states with unpredictable future outcomes, generating intrinsic rewards from prediction errors in learned embeddings. It resides in the 'Curiosity-Driven Exploration via Temporal Contrastive Representations' leaf, which contains four papers total (including this one). This leaf sits within a broader branch of 'Temporal Contrastive Representation Learning for Exploration and Control' (four leaves, approximately 16 papers), indicating a moderately active research direction. The taxonomy shows this is not an isolated niche but part of a structured exploration-focused subfield.

The taxonomy reveals neighboring leaves focused on visual control (four papers), spatial-temporal fusion (three papers), and hierarchical goal-conditioned RL (three papers). These adjacent directions share the use of temporal contrastive objectives but diverge in application: visual control emphasizes visuomotor policies from high-dimensional observations, while hierarchical methods discover subgoals for planning. The paper's focus on curiosity-driven exploration distinguishes it from these control-centric approaches, though the underlying contrastive mechanism connects across boundaries. The taxonomy's scope and exclude notes clarify that this work targets intrinsic motivation rather than policy learning or multi-agent coordination.

Among 18 candidates examined, the contribution-level analysis shows mixed novelty signals. The core exploration method (4 candidates examined, 1 refutable) and intrinsic reward mechanism (10 candidates examined, 1 refutable) both encounter at least one overlapping prior work within the limited search scope. The third contribution—positioning as a simpler alternative to quasimetric methods—shows no refutable candidates among 4 examined, suggesting this framing may be more distinctive. The statistics indicate a focused but not exhaustive search: the presence of refutable candidates does not imply the ideas are well-trodden, only that some relevant prior work exists within the top-K semantic matches.

Based on the limited search scope (18 candidates from semantic retrieval), the work appears to build on established principles of temporal contrastive learning for exploration, with some contributions showing overlap in the examined literature. The taxonomy context suggests the paper operates in a moderately populated research area where temporal contrastive methods for curiosity are actively studied. The analysis does not cover exhaustive citation networks or domain-specific venues, so additional related work may exist beyond the examined candidates.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: exploration via temporal contrastive representations. This field leverages contrastive learning over temporal sequences to build representations that guide exploration and decision-making in reinforcement learning and beyond. The taxonomy reveals five main branches. The first branch focuses on temporal contrastive representation learning for exploration and control, encompassing curiosity-driven methods that use temporal contrasts to identify novel states or behaviors. A second branch examines transfer and multi-agent coordination, where temporal contrastive objectives help agents generalize across tasks or coordinate in shared environments. A third branch extends these ideas to time-series analysis and non-RL applications such as video understanding and crop mapping. The fourth branch targets resource-aware decision-making, using temporal action representations to optimize computational or physical constraints. Finally, a fifth branch explores the intersection of generative AI and temporal contrastive learning in RL, investigating how generative models can enhance or complement contrastive objectives. Within the exploration and control branch, many studies design intrinsic rewards or curiosity signals by contrasting temporally adjacent or distant observations. Temporal Contrastive Behaviors[0] sits squarely in this curiosity-driven cluster, alongside works like Curiosity Temporal Contrastive[24] and Temporal Inconsistency Exploration[17], which similarly exploit temporal structure to drive exploration. Compared to Episodic Novelty Distance[11], which measures novelty via episodic memory, Temporal Contrastive Behaviors[0] emphasizes learning contrastive embeddings that capture behavioral dynamics over time. Meanwhile, foundational methods such as Time-Contrastive Networks[16] established early principles for temporal contrast, and recent efforts like STACoRe[3] and Premier-TACO[9] refine these ideas with state-action or hierarchical abstractions. The central trade-off across these lines involves balancing the granularity of temporal contrasts—whether to compare consecutive frames, longer horizons, or abstract action sequences—against computational cost and sample efficiency.

Claimed Contributions

Exploration method using temporal contrastive representations

Can Refute

4 retrieved papers

The authors introduce an exploration approach that uses temporal contrastive learning to learn representations capturing future state occupancy. The method rewards agents for visiting states with unpredictable futures, enabling complex exploratory behaviors without extrinsic rewards.

4 retrieved papers

Can Refute

Intrinsic reward based on temporal representation prediction error

Can Refute

10 retrieved papers

The paper presents a novel intrinsic reward signal derived from the prediction error of learned temporal contrastive representations. This reward encourages agents to explore states that are less informative about future states according to the contrastive model.

10 retrieved papers

Can Refute

Simpler alternative to quasimetric-based exploration methods

4 retrieved papers

The authors develop a method that avoids the complexity of quasimetric learning and episodic memory used in prior work like ETD. Their approach works directly with temporal similarities, making it more amenable to off-policy RL algorithms while maintaining competitive performance.

4 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[11] Episodic novelty through temporal distance PDF

Jiang Yu-hua, Liu, Qihan, Yuhua Jiang, Yang, Yiqin, Qihan Liu, Ma Xiaoteng, Yiqin Yang, Xiaoteng Ma, Hu Hao, Dianyu Zhong, Yang Jun, Haotian Hu, Liang-Bin, Jun Yang, Xu Bo, Bin Liang, Zhang, Chongjie, Bo Xu, Zhao, Qianchuan, Chongjie Zhang, Qianchuan Zhao (2025)

[17] Self-Supervised Exploration via Temporal Inconsistency in Reinforcement Learning PDF

Zijian Gao, Kele Xu, Yuanzhao Zhai, Bo Ding, Dawei Feng, Xin-Jun Mao, Xinjun Mao, Huaimin Wang (2024)

[24] Curiosity-Driven Exploration via Temporal Contrastive Learning PDF

F Mohamed, C Ji, B Eysenbach, G Berseth (0)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Exploration method using temporal contrastive representations

[24] Curiosity-Driven Exploration via Temporal Contrastive Learning PDF

Can Refute

[7] Temporal abstractions-augmented temporally contrastive learning: An alternative to the Laplacian in RL PDF

Cannot Refute

[38] Contrastive difference predictive coding PDF

Cannot Refute

[39] Curiosity-driven learning in artificial intelligence and its applications PDF

Cannot Refute

Contribution

Intrinsic reward based on temporal representation prediction error

[31] Latent world models for intrinsically motivated exploration PDF

Can Refute

[28] Learning to Explore in Diverse Reward Settings via Temporal-Difference-Error Maximization PDF

Cannot Refute

[29] A reinforcement learning method of solving Markov decision processes: an adaptive exploration model based on temporal difference error PDF

Cannot Refute

[30] Tracking emotions: intrinsic motivation grounded on multi-level prediction error dynamics PDF

Cannot Refute

[32] Variational state encoding as intrinsic motivation in reinforcement learning PDF

Cannot Refute

[33] Temporal difference uncertainties as a signal for exploration PDF

Cannot Refute

[34] In search of the neural circuits of intrinsic motivation PDF

Cannot Refute

[35] Phasic dopamine as a prediction error of intrinsic and extrinsic reinforcements driving both action acquisition and reward maximization: A simulated robotic study PDF

Cannot Refute

[36] Surprise signals in the supplementary eye field: rectified prediction errors drive exploration-exploitation transitions PDF

Cannot Refute

[37] Predication-Error-Based Intrinsically Motivated Saccade Learning PDF

Cannot Refute

Contribution

Simpler alternative to quasimetric-based exploration methods

[40] Multistep Quasimetric Learning for Scalable Goal-conditioned Reinforcement Learning PDF

Cannot Refute

[41] Offline Goal-conditioned Reinforcement Learning with Quasimetric Representations PDF

Cannot Refute

[42] Goal Reaching with Eikonal-Constrained Hierarchical Quasimetric Reinforcement Learning PDF

Cannot Refute

[43] Quasimetric decision transformer: enhancing goal-conditioned reinforcement learning with structured distance guidance PDF

Cannot Refute

Discovering Diverse Behaviors via Temporal Contrastive Learning

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[11] Episodic novelty through temporal distance PDF

[17] Self-Supervised Exploration via Temporal Inconsistency in Reinforcement Learning PDF

[24] Curiosity-Driven Exploration via Temporal Contrastive Learning PDF

Contribution Analysis

Exploration method using temporal contrastive representations

[24] Curiosity-Driven Exploration via Temporal Contrastive Learning PDF

[7] Temporal abstractions-augmented temporally contrastive learning: An alternative to the Laplacian in RL PDF

[38] Contrastive difference predictive coding PDF

[39] Curiosity-driven learning in artificial intelligence and its applications PDF

Intrinsic reward based on temporal representation prediction error

[31] Latent world models for intrinsically motivated exploration PDF

[28] Learning to Explore in Diverse Reward Settings via Temporal-Difference-Error Maximization PDF

[29] A reinforcement learning method of solving Markov decision processes: an adaptive exploration model based on temporal difference error PDF

[30] Tracking emotions: intrinsic motivation grounded on multi-level prediction error dynamics PDF

[32] Variational state encoding as intrinsic motivation in reinforcement learning PDF

[33] Temporal difference uncertainties as a signal for exploration PDF

[34] In search of the neural circuits of intrinsic motivation PDF

[35] Phasic dopamine as a prediction error of intrinsic and extrinsic reinforcements driving both action acquisition and reward maximization: A simulated robotic study PDF

[36] Surprise signals in the supplementary eye field: rectified prediction errors drive exploration-exploitation transitions PDF

[37] Predication-Error-Based Intrinsically Motivated Saccade Learning PDF

Simpler alternative to quasimetric-based exploration methods

[40] Multistep Quasimetric Learning for Scalable Goal-conditioned Reinforcement Learning PDF

[41] Offline Goal-conditioned Reinforcement Learning with Quasimetric Representations PDF

[42] Goal Reaching with Eikonal-Constrained Hierarchical Quasimetric Reinforcement Learning PDF

[43] Quasimetric decision transformer: enhancing goal-conditioned reinforcement learning with structured distance guidance PDF

Table of Contents