R2-Dreamer: Redundancy-Reduced World Models without Decoders or Augmentation

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

model-based reinforcement learningworld modelsrepresentation learning

A central challenge in image-based Model-Based Reinforcement Learning (MBRL) is to learn representations that distill task-essential information from irrelevant details. While promising, approaches that learn representations by reconstructing input images often waste capacity on spatially large but task-irrelevant visual information, such as backgrounds. Decoder-free methods address this issue by leveraging data augmentation (DA) to enforce robust representations, but the reliance on such external regularizers to prevent collapse severely limits their versatility. To address this, we propose R2-Dreamer, an MBRL framework that introduces a self-supervised objective acting as an internal regularizer, thus preventing collapse without resorting to DA. The core of our method is a feature redundancy reduction objective inspired by Barlow Twins, which can be easily integrated into existing frameworks. In evaluations on standard continuous control benchmarks, including DMC Vision and Meta-World, R2-Dreamer achieves performance competitive with strong baselines, including the leading decoder-based agent DreamerV3 and its decoder-free counterpart that relies on DA. Notably, thanks to its simple decoder-free design, R2-Dreamer achieves 1.59x faster training than DreamerV3. Furthermore, its effectiveness is highlighted on a challenging benchmark with tiny but task-relevant objects (DMC-Subtle), where our approach demonstrates substantial gains over all baselines. These results show that R2-Dreamer provides a versatile, high-performance framework for decoder-free MBRL by incorporating an effective internal regularizer.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

R2-Dreamer proposes a decoder-free MBRL framework using a redundancy reduction objective inspired by Barlow Twins to prevent representation collapse without data augmentation. The paper sits in the Reconstruction-Free Approaches leaf, which contains four papers total, including the original work. This leaf is part of the broader Representation Learning Objectives and Architectures branch, which encompasses reconstruction-based methods, multi-objective frameworks, and structured representations. The reconstruction-free direction appears moderately populated, suggesting active but not overcrowded research interest in alternatives to pixel-level reconstruction.

The taxonomy reveals that R2-Dreamer's neighbors include methods using contrastive learning, prototypes, and other non-reconstructive objectives. Adjacent leaves contain reconstruction-based approaches that rely on pixel prediction and multi-objective frameworks that combine multiple learning signals. The Robustness to Visual Distractions branch includes closely related work on data augmentation and self-supervision, which R2-Dreamer explicitly aims to avoid. The scope notes clarify that reconstruction-free methods distinguish themselves by eschewing pixel-level objectives, while methods combining both reconstruction and other objectives belong under Multi-Objective Learning.

Among twenty-five candidates examined, the framework contribution shows one refutable candidate from nine examined, while the representation learning paradigm contribution also has one refutable candidate from six examined. The benchmark contribution appears more novel, with zero refutable candidates among ten examined. The limited search scope means these statistics reflect top-K semantic matches and citation expansion, not exhaustive coverage. The framework and paradigm contributions face more substantial prior work overlap, while the evaluation benchmark appears less contested within the examined literature.

Based on the limited search of twenty-five candidates, R2-Dreamer's core technical contributions appear to have some overlap with existing reconstruction-free methods, particularly regarding the framework design and representation learning approach. The benchmark contribution shows stronger novelty signals within the examined scope. The analysis does not cover the full breadth of MBRL literature, focusing instead on semantically similar papers and their citations.

Taxonomy

This LLM-generated taxonomy tree may contain errors and therefore requires manual review; it could include omissions or duplicates.

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Learning task-relevant visual representations in model-based reinforcement learning. The field addresses how agents can extract meaningful features from high-dimensional visual observations to support planning and decision-making. The taxonomy reveals several complementary research directions: Representation Learning Objectives and Architectures explores how to design encoders that capture task-relevant structure, including reconstruction-based methods and reconstruction-free alternatives; Robustness to Visual Distractions and Generalization investigates how representations can remain stable under irrelevant visual changes; World Model Design and Dynamics Learning focuses on predicting future states in learned latent spaces; Transfer and Pretraining Strategies examine how visual knowledge from other domains can accelerate learning; Multimodal and Language-Grounded Models integrate additional modalities beyond vision; Evaluation and Benchmarking provides standardized testbeds; and Domain Applications demonstrate these techniques in real-world settings such as robotics and autonomous driving. A central tension in the field concerns whether reconstruction objectives are necessary for learning useful representations. Traditional world models like Visual Foresight[33] and later works such as Dreamerpro[21] often rely on pixel reconstruction, but recent reconstruction-free approaches argue that directly optimizing for task-relevant features can be more efficient and robust. R2-Dreamer[0] sits squarely within this reconstruction-free branch, emphasizing reward-driven representation learning without pixel-level decoding. This contrasts with nearby works like Visual Pretraining via Contrastive[22], which uses contrastive objectives for pretraining, and Dream to generalize[28], which explores how learned representations transfer across visual variations. Meanwhile, methods addressing visual distractions (e.g., Focus on what matters[36]) and those incorporating auxiliary objectives (e.g., Learning task informed abstractions[26]) highlight ongoing debates about what inductive biases best guide representation learning when visual observations contain both relevant dynamics and irrelevant noise.

Claimed Contributions

R2-Dreamer framework with internal redundancy reduction objective

Can Refute

9 retrieved papers

The authors introduce R2-Dreamer, a model-based reinforcement learning framework that replaces pixel reconstruction and data augmentation with a feature redundancy reduction objective inspired by Barlow Twins. This internal regularizer prevents representation collapse without requiring external augmentations.

9 retrieved papers

Can Refute

New representation learning paradigm for RSSM-based decoder-free MBRL

Can Refute

6 retrieved papers

The authors propose a novel approach to learning representations in Recurrent State-Space Model architectures that eliminates the need for both image decoders and data augmentation by using an internal redundancy reduction mechanism instead of heuristic augmentation strategies.

6 retrieved papers

Can Refute

DMC-Subtle benchmark for evaluating representation learning

10 retrieved papers

The authors introduce DMC-Subtle, a new challenging benchmark suite where task-critical objects are significantly reduced in size compared to standard DeepMind Control tasks. This benchmark is designed to test methods' ability to focus on subtle but essential visual information.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[21] Dreamerpro: Reconstruction-free model-based reinforcement learning with prototypical representations PDF

Deng Fei, Fei Deng, Ahn Sungjin, Ingook Jang, Sungjin Ahn (2022)

[22] Visual Pretraining via Contrastive Predictive Model for Pixel-Based Reinforcement Learning PDF

T. Luu, Thang Vu, Thanh Nguyen, Tung M. Luu, C. D. Yoo, Chang D. Yoo (2022)

[28] Dream to generalize: zero-shot model-based reinforcement learning for unseen visual distractions PDF

Kim Kyung-soo, Kim Yusung (2023)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

R2-Dreamer framework with internal redundancy reduction objective

[58] A Simple Framework for Self-Supervised Learning of Sample-Efficient World Models PDF

Can Refute

[51] Regularized latent dynamics prediction is a strong baseline for behavioral foundation models PDF

Cannot Refute

[52] Self-supervised representations for multi-view reinforcement learning PDF

Cannot Refute

[53] Combining reconstruction and contrastive methods for multimodal representations in RL PDF

Cannot Refute

[54] Barlowrl: Barlow twins for data-efficient reinforcement learning PDF

Cannot Refute

[55] Learning Disentangled Representations for Deep Reinforcement Learning using Self-Supervised Learning PDF

Cannot Refute

[56] A Survey on Joint Embedding Predictive Architectures and World Models PDF

Cannot Refute

[57] Learning Latent Multimodal Dynamics for Optimized Resource Planning PDF

Cannot Refute

[59] Learning Minimal Representations with Model Invariance PDF

Cannot Refute

Contribution

New representation learning paradigm for RSSM-based decoder-free MBRL

[71] Dreamingv2: Reinforcement learning with discrete world models without reconstruction PDF

Can Refute

[69] Uncertainty Representations in State-Space Layers for Deep Reinforcement Learning under Partial Observability PDF

Cannot Refute

[70] Decision Mamba: A Multi-Grained State Space Model with Self-Evolution Regularization for Offline RL PDF

Cannot Refute

[72] Task-Prompt Generalised World Model in Multi-Environment Offline Reinforcement Learning PDF

Cannot Refute

[73] M3PO: Massively Multi-Task Model-Based Policy Optimization PDF

Cannot Refute

[74] Does Visual Latent Quality Improve Dreamer-Style Model-Based RL? PDF

Cannot Refute

Contribution

DMC-Subtle benchmark for evaluating representation learning

[27] Masked World Models for Visual Control PDF

Cannot Refute

[60] Unsupervised state representation learning in atari PDF

Cannot Refute

[61] Towards Large-Scale Small Object Detection: Survey and Benchmarks PDF

Cannot Refute

[62] Enhancing Environmental Robustness in Few-Shot Learning via Conditional Representation Learning PDF

Cannot Refute

[63] Scalable deep reinforcement learning for vision-based robotic manipulation PDF

Cannot Refute

[64] High-Precision Transformer-Based Visual Servoing for Humanoid Robots in Aligning Tiny Objects PDF

Cannot Refute

[65] Shop-vrb: A visual reasoning benchmark for object perception PDF

Cannot Refute

[66] Benchmarking protocols for evaluating small parts robotic assembly systems PDF

Cannot Refute

[67] AVisT: A Benchmark for Visual Object Tracking in Adverse Visibility PDF

Cannot Refute

[68] Detecting and Tracking Small and Dense Moving Objects in Satellite Videos: A Benchmark PDF

Cannot Refute

R2-Dreamer: Redundancy-Reduced World Models without Decoders or Augmentation

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[21] Dreamerpro: Reconstruction-free model-based reinforcement learning with prototypical representations PDF

[22] Visual Pretraining via Contrastive Predictive Model for Pixel-Based Reinforcement Learning PDF

[28] Dream to generalize: zero-shot model-based reinforcement learning for unseen visual distractions PDF

Contribution Analysis

R2-Dreamer framework with internal redundancy reduction objective

[58] A Simple Framework for Self-Supervised Learning of Sample-Efficient World Models PDF

[51] Regularized latent dynamics prediction is a strong baseline for behavioral foundation models PDF

[52] Self-supervised representations for multi-view reinforcement learning PDF

[53] Combining reconstruction and contrastive methods for multimodal representations in RL PDF

[54] Barlowrl: Barlow twins for data-efficient reinforcement learning PDF

[55] Learning Disentangled Representations for Deep Reinforcement Learning using Self-Supervised Learning PDF

[56] A Survey on Joint Embedding Predictive Architectures and World Models PDF

[57] Learning Latent Multimodal Dynamics for Optimized Resource Planning PDF

[59] Learning Minimal Representations with Model Invariance PDF

New representation learning paradigm for RSSM-based decoder-free MBRL

[71] Dreamingv2: Reinforcement learning with discrete world models without reconstruction PDF

[69] Uncertainty Representations in State-Space Layers for Deep Reinforcement Learning under Partial Observability PDF

[70] Decision Mamba: A Multi-Grained State Space Model with Self-Evolution Regularization for Offline RL PDF

[72] Task-Prompt Generalised World Model in Multi-Environment Offline Reinforcement Learning PDF

[73] M3PO: Massively Multi-Task Model-Based Policy Optimization PDF

[74] Does Visual Latent Quality Improve Dreamer-Style Model-Based RL? PDF

DMC-Subtle benchmark for evaluating representation learning

[27] Masked World Models for Visual Control PDF

[60] Unsupervised state representation learning in atari PDF

[61] Towards Large-Scale Small Object Detection: Survey and Benchmarks PDF

[62] Enhancing Environmental Robustness in Few-Shot Learning via Conditional Representation Learning PDF

[63] Scalable deep reinforcement learning for vision-based robotic manipulation PDF

[64] High-Precision Transformer-Based Visual Servoing for Humanoid Robots in Aligning Tiny Objects PDF

[65] Shop-vrb: A visual reasoning benchmark for object perception PDF

[66] Benchmarking protocols for evaluating small parts robotic assembly systems PDF

[67] AVisT: A Benchmark for Visual Object Tracking in Adverse Visibility PDF

[68] Detecting and Tracking Small and Dense Moving Objects in Satellite Videos: A Benchmark PDF

Table of Contents