Discovering Diverse Behaviors via Temporal Contrastive Learning

ICLR 2026 Conference SubmissionAnonymous Authors
reinforcement learningexplorationintrinsic motivationsurpriseempowermentcontrastive learning
Abstract:

Effective exploration in reinforcement learning requires not only tracking where an agent has been, but also understanding how the agent perceives and represents the world. To learn powerful representations, an agent should actively explore states that contribute to its knowledge of the environment. Temporal representations can capture the information necessary to solve a wide range of potential tasks while avoiding the computational cost associated with full state reconstruction. In this paper, we propose an exploration method that leverages temporal contrastive representations to guide exploration, prioritizing states with unpredictable future outcomes. We demonstrate that such representations can enable the learning of complex exploratory behaviors in locomotion, manipulation, and embodied-AI tasks, revealing capabilities and behaviors that traditionally require extrinsic rewards. Unlike approaches that rely on explicit distance learning or episodic memory mechanisms (e.g., quasimetric-based methods), our method builds directly on temporal similarities, yielding a simpler yet effective strategy for exploration.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes an exploration method that uses temporal contrastive representations to prioritize states with unpredictable future outcomes, generating intrinsic rewards from prediction errors in learned embeddings. It resides in the 'Curiosity-Driven Exploration via Temporal Contrastive Representations' leaf, which contains four papers total (including this one). This leaf sits within a broader branch of 'Temporal Contrastive Representation Learning for Exploration and Control' (four leaves, approximately 16 papers), indicating a moderately active research direction. The taxonomy shows this is not an isolated niche but part of a structured exploration-focused subfield.

The taxonomy reveals neighboring leaves focused on visual control (four papers), spatial-temporal fusion (three papers), and hierarchical goal-conditioned RL (three papers). These adjacent directions share the use of temporal contrastive objectives but diverge in application: visual control emphasizes visuomotor policies from high-dimensional observations, while hierarchical methods discover subgoals for planning. The paper's focus on curiosity-driven exploration distinguishes it from these control-centric approaches, though the underlying contrastive mechanism connects across boundaries. The taxonomy's scope and exclude notes clarify that this work targets intrinsic motivation rather than policy learning or multi-agent coordination.

Among 18 candidates examined, the contribution-level analysis shows mixed novelty signals. The core exploration method (4 candidates examined, 1 refutable) and intrinsic reward mechanism (10 candidates examined, 1 refutable) both encounter at least one overlapping prior work within the limited search scope. The third contribution—positioning as a simpler alternative to quasimetric methods—shows no refutable candidates among 4 examined, suggesting this framing may be more distinctive. The statistics indicate a focused but not exhaustive search: the presence of refutable candidates does not imply the ideas are well-trodden, only that some relevant prior work exists within the top-K semantic matches.

Based on the limited search scope (18 candidates from semantic retrieval), the work appears to build on established principles of temporal contrastive learning for exploration, with some contributions showing overlap in the examined literature. The taxonomy context suggests the paper operates in a moderately populated research area where temporal contrastive methods for curiosity are actively studied. The analysis does not cover exhaustive citation networks or domain-specific venues, so additional related work may exist beyond the examined candidates.

Taxonomy

Core-task Taxonomy Papers
27
3
Claimed Contributions
18
Contribution Candidate Papers Compared
2
Refutable Paper

Research Landscape Overview

Core task: exploration via temporal contrastive representations. This field leverages contrastive learning over temporal sequences to build representations that guide exploration and decision-making in reinforcement learning and beyond. The taxonomy reveals five main branches. The first branch focuses on temporal contrastive representation learning for exploration and control, encompassing curiosity-driven methods that use temporal contrasts to identify novel states or behaviors. A second branch examines transfer and multi-agent coordination, where temporal contrastive objectives help agents generalize across tasks or coordinate in shared environments. A third branch extends these ideas to time-series analysis and non-RL applications such as video understanding and crop mapping. The fourth branch targets resource-aware decision-making, using temporal action representations to optimize computational or physical constraints. Finally, a fifth branch explores the intersection of generative AI and temporal contrastive learning in RL, investigating how generative models can enhance or complement contrastive objectives. Within the exploration and control branch, many studies design intrinsic rewards or curiosity signals by contrasting temporally adjacent or distant observations. Temporal Contrastive Behaviors[0] sits squarely in this curiosity-driven cluster, alongside works like Curiosity Temporal Contrastive[24] and Temporal Inconsistency Exploration[17], which similarly exploit temporal structure to drive exploration. Compared to Episodic Novelty Distance[11], which measures novelty via episodic memory, Temporal Contrastive Behaviors[0] emphasizes learning contrastive embeddings that capture behavioral dynamics over time. Meanwhile, foundational methods such as Time-Contrastive Networks[16] established early principles for temporal contrast, and recent efforts like STACoRe[3] and Premier-TACO[9] refine these ideas with state-action or hierarchical abstractions. The central trade-off across these lines involves balancing the granularity of temporal contrasts—whether to compare consecutive frames, longer horizons, or abstract action sequences—against computational cost and sample efficiency.

Claimed Contributions

Exploration method using temporal contrastive representations

The authors introduce an exploration approach that uses temporal contrastive learning to learn representations capturing future state occupancy. The method rewards agents for visiting states with unpredictable futures, enabling complex exploratory behaviors without extrinsic rewards.

4 retrieved papers
Can Refute
Intrinsic reward based on temporal representation prediction error

The paper presents a novel intrinsic reward signal derived from the prediction error of learned temporal contrastive representations. This reward encourages agents to explore states that are less informative about future states according to the contrastive model.

10 retrieved papers
Can Refute
Simpler alternative to quasimetric-based exploration methods

The authors develop a method that avoids the complexity of quasimetric learning and episodic memory used in prior work like ETD. Their approach works directly with temporal similarities, making it more amenable to off-policy RL algorithms while maintaining competitive performance.

4 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Exploration method using temporal contrastive representations

The authors introduce an exploration approach that uses temporal contrastive learning to learn representations capturing future state occupancy. The method rewards agents for visiting states with unpredictable futures, enabling complex exploratory behaviors without extrinsic rewards.

Contribution

Intrinsic reward based on temporal representation prediction error

The paper presents a novel intrinsic reward signal derived from the prediction error of learned temporal contrastive representations. This reward encourages agents to explore states that are less informative about future states according to the contrastive model.

Contribution

Simpler alternative to quasimetric-based exploration methods

The authors develop a method that avoids the complexity of quasimetric learning and episodic memory used in prior work like ETD. Their approach works directly with temporal similarities, making it more amenable to off-policy RL algorithms while maintaining competitive performance.