Learning Massively Multitask World Models for Continuous Control

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

reinforcement learningworld modelscontinuous control

General-purpose control demands agents that act across many tasks and embodiments, yet research on reinforcement learning (RL) for continuous control remains dominated by single-task or offline regimens, reinforcing a view that online RL does not scale. Inspired by the foundation model recipe (large-scale pretraining followed by light RL) we ask whether a single agent can be trained on hundreds of tasks with online interaction. To accelerate research in this direction, we introduce a new benchmark with 200 diverse tasks spanning many domains and embodiments, each with language instructions, demonstrations, and optionally image observations. We then present Newt, a language-conditioned multitask world model that is first pretrained on demonstrations to acquire task-aware representations and action priors, and then jointly optimized with online interaction across all tasks. Experiments show that Newt yields better multitask performance and data-efficiency than a set of strong baselines, exhibits strong open-loop control, and enables rapid adaptation to unseen tasks. We release our environments, demonstrations, code for training and evaluation, as well as 200+ checkpoints. Website: https://newt-world-models.github.io

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces MMBench, a 200-task benchmark spanning diverse domains and embodiments, and Newt, a language-conditioned world model pretrained on demonstrations then fine-tuned with online RL. It resides in the 'Generalist World Model Pretraining' leaf, which contains only three papers total. This is a relatively sparse research direction within the broader taxonomy of 50 papers across 36 topics, suggesting the work targets an emerging area where large-scale multitask world model pretraining remains underexplored compared to more established branches like cross-embodiment transfer or policy architectures.

The taxonomy reveals neighboring directions that share conceptual ground but differ in scope. 'Latent World Models for Continuous Control' focuses on trajectory optimization without large-scale pretraining, while sibling categories like 'Cross-Embodiment Transfer' and 'Language and Multimodal Grounding' address complementary challenges of morphology generalization and linguistic task specification. The paper's emphasis on massively multitask online RL distinguishes it from purely offline or single-task model-based methods, positioning it at the intersection of world model learning, language conditioning, and scalable online interaction across hundreds of tasks.

Among 25 candidates examined, the contribution-level analysis shows mixed novelty signals. The benchmark contribution (MMBench) examined 10 candidates with zero refutations, suggesting limited prior work on 200-task continuous control benchmarks at this scale. The Newt architecture examined 5 candidates with no refutations, indicating the specific combination of language conditioning and world model pretraining may be relatively novel. However, the demonstration of massively multitask online RL examined 10 candidates and found 1 refutable match, suggesting some prior exploration of large-scale multitask online learning, though the search scope was limited.

Based on the top-25 semantic matches examined, the work appears to occupy a sparsely populated research direction, particularly in combining world model pretraining with hundreds of tasks and online interaction. The limited search scope means the analysis cannot rule out relevant prior work outside the candidate set, especially in adjacent areas like offline multitask RL or smaller-scale world model benchmarks. The taxonomy structure suggests the field is still consolidating around how to scale model-based multitask learning effectively.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: multitask reinforcement learning for continuous control across diverse tasks and embodiments. The field is organized around several complementary directions. Multitask Policy Architectures and Parameter Sharing explores how to design networks that efficiently handle multiple tasks, often through attention mechanisms or mixture-of-experts structures like Attention Mixture Experts[3]. Cross-Task Knowledge Transfer and Guidance investigates how learned policies can inform or accelerate training on new tasks, as seen in Cross-Task Policy Guidance[4]. Cross-Embodiment Transfer and Generalization addresses the challenge of applying policies across different robot morphologies, with works like Cross-Embodied Learning[8] and Universal Morphology Control[34] tackling morphology-agnostic control. World Models and Model-Based Multitask Learning focuses on learning predictive models that generalize across tasks, exemplified by TD-MPC2[5] and GenRL[16]. Language and Multimodal Grounding for Embodied Control leverages linguistic or multimodal signals to guide policies, as in Code as Policies[6] and Multimodal LLMs Embodied[20]. Imitation Learning and Demonstration-Based Multitask Methods use expert data to bootstrap multitask policies, while Temporal Abstraction and Hierarchical Multitask Learning decomposes complex behaviors into reusable skills. Specialized Multitask Applications target domain-specific challenges, and Benchmarks, Frameworks, and Evaluation Infrastructure provide standardized testbeds like Benchmark Multitask Continuous[41]. A particularly active line of work centers on generalist world model pretraining, where the goal is to learn a single predictive model that can support planning or policy learning across a wide range of tasks and embodiments. Massively Multitask World Models[0] sits squarely in this branch, emphasizing large-scale pretraining of world models to enable zero-shot or few-shot transfer. This approach contrasts with more task-specific model-based methods and shares conceptual ground with GenRL[16] and Generalist World Model[19], which similarly pursue broad generalization through learned dynamics. Compared to TD-MPC2[5], which focuses on efficient online planning with a compact model, Massively Multitask World Models[0] scales up the diversity of training data and tasks to achieve broader coverage. The central trade-off in this area involves balancing model capacity and training scale against sample efficiency and computational cost, with open questions around how much task diversity is needed and whether a single world model can truly capture the full spectrum of embodied control challenges.

Claimed Contributions

MMBench: A benchmark for massively multitask reinforcement learning

10 retrieved papers

The authors introduce MMBench, the first benchmark designed for massively multitask RL, comprising 200 continuous control tasks across 10 domains with language instructions, demonstrations, and support for both state and RGB observations. This includes 41 new tasks and a new task suite called MiniArcade with 19 arcade-style environments.

10 retrieved papers

Newt: A language-conditioned multitask world model

5 retrieved papers

The authors present Newt, a model-based RL agent that extends TD-MPC2 to the massively multitask online setting. It uses a self-predictive world model conditioned on language instructions and optionally images, with algorithmic improvements including model-based pretraining on demonstrations, additional action supervision via behavior cloning loss, and constrained planning.

5 retrieved papers

Demonstration of feasibility and effectiveness of massively multitask online RL

Can Refute

10 retrieved papers

The authors demonstrate that training a single agent via online RL on hundreds of tasks simultaneously is feasible and effective. Their experiments show Newt outperforms strong baselines in multitask performance and data efficiency, can perform open-loop control over long horizons, and transfers well to unseen tasks through few-shot finetuning.

10 retrieved papers

Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[16] GenRL: Multimodal-foundation world models for generalization in embodied agents PDF

Mazzaglia, Pietro, Verbelen, Tim, Dhoedt, Bart (2024)

[19] Generalist World Model Pre-Training for Efficient Reinforcement Learning PDF

Y Zhao, A Scannell, Y Hou, T Cui, L Chen (2025)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

MMBench: A benchmark for massively multitask reinforcement learning

[59] Concept2Robot: Learning manipulation concepts from instructions and human demonstrations PDF

Cannot Refute

[60] Meta-Reinforcement Learning via Language Instructions PDF

Cannot Refute

[61] Lemma: Learning language-conditioned multi-robot manipulation PDF

Cannot Refute

[62] Contrastive imitation learning for language-guided multi-task robotic manipulation PDF

Cannot Refute

[63] Language Conditioned Imitation Learning Over Unstructured Data PDF

Cannot Refute

[64] Perceiver-actor: A multi-task transformer for robotic manipulation PDF

Cannot Refute

[65] Baku: An efficient transformer for multi-task policy learning PDF

Cannot Refute

[66] Learning from Symmetry: Meta-Reinforcement Learning with Symmetrical Behaviors and Language Instructions PDF

Cannot Refute

[67] Bootstrap latent-predictive representations for multitask reinforcement learning PDF

Cannot Refute

[68] Rlbench: The robot learning benchmark & learning environment PDF

Cannot Refute

Contribution

Newt: A language-conditioned multitask world model

[69] A Vision-Language-Action-Critic Model for Robotic Real-World Reinforcement Learning PDF

Cannot Refute

[70] Language-Conditioned Offline RL for Multi-Robot Navigation PDF

Cannot Refute

[71] Vision-language models provide promptable representations for reinforcement learning PDF

Cannot Refute

[72] Real-World Offline Reinforcement Learning from Vision Language Model Feedback PDF

Cannot Refute

[73] Lancon-learn: Learning with language to enable generalization in multi-task manipulation PDF

Cannot Refute

Contribution

Demonstration of feasibility and effectiveness of massively multitask online RL

[9] MT-Opt: Continuous Multi-Task Robotic Reinforcement Learning at Scale PDF

Can Refute

[17] Large Language Models as Generalizable Policies for Embodied Tasks PDF

Cannot Refute

[51] Distral: Robust multitask reinforcement learning PDF

Cannot Refute

[52] Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning PDF

Cannot Refute

[53] Massively Multitask Networks for Drug Discovery PDF

Cannot Refute

[54] A survey of multi-task deep reinforcement learning PDF

Cannot Refute

[55] Sharing Knowledge in Multi-Task Deep Reinforcement Learning PDF

Cannot Refute

[56] Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning PDF

Cannot Refute

[57] Integrating Diffusion-based Multi-task Learning with Online Reinforcement Learning for Robust Quadruped Robot Control PDF

Cannot Refute

[58] Amago-2: Breaking the multi-task barrier in meta-reinforcement learning with transformers PDF

Cannot Refute

Learning Massively Multitask World Models for Continuous Control

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[16] GenRL: Multimodal-foundation world models for generalization in embodied agents PDF

[19] Generalist World Model Pre-Training for Efficient Reinforcement Learning PDF

Contribution Analysis

MMBench: A benchmark for massively multitask reinforcement learning

[59] Concept2Robot: Learning manipulation concepts from instructions and human demonstrations PDF

[60] Meta-Reinforcement Learning via Language Instructions PDF

[61] Lemma: Learning language-conditioned multi-robot manipulation PDF

[62] Contrastive imitation learning for language-guided multi-task robotic manipulation PDF

[63] Language Conditioned Imitation Learning Over Unstructured Data PDF

[64] Perceiver-actor: A multi-task transformer for robotic manipulation PDF

[65] Baku: An efficient transformer for multi-task policy learning PDF

[66] Learning from Symmetry: Meta-Reinforcement Learning with Symmetrical Behaviors and Language Instructions PDF

[67] Bootstrap latent-predictive representations for multitask reinforcement learning PDF

[68] Rlbench: The robot learning benchmark & learning environment PDF

Newt: A language-conditioned multitask world model

[69] A Vision-Language-Action-Critic Model for Robotic Real-World Reinforcement Learning PDF

[70] Language-Conditioned Offline RL for Multi-Robot Navigation PDF

[71] Vision-language models provide promptable representations for reinforcement learning PDF

[72] Real-World Offline Reinforcement Learning from Vision Language Model Feedback PDF

[73] Lancon-learn: Learning with language to enable generalization in multi-task manipulation PDF

Demonstration of feasibility and effectiveness of massively multitask online RL

[9] MT-Opt: Continuous Multi-Task Robotic Reinforcement Learning at Scale PDF

[17] Large Language Models as Generalizable Policies for Embodied Tasks PDF

[51] Distral: Robust multitask reinforcement learning PDF

[52] Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning PDF

[53] Massively Multitask Networks for Drug Discovery PDF

[54] A survey of multi-task deep reinforcement learning PDF

[55] Sharing Knowledge in Multi-Task Deep Reinforcement Learning PDF

[56] Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning PDF

[57] Integrating Diffusion-based Multi-task Learning with Online Reinforcement Learning for Robust Quadruped Robot Control PDF

[58] Amago-2: Breaking the multi-task barrier in meta-reinforcement learning with transformers PDF

Table of Contents