Zero-Shot Adaptation of Behavioral Foundation Models to Unseen Dynamics

ICLR 2026 Conference SubmissionAnonymous Authors
zero-shot reinforcement learningunsupervised reinforcement learningsuccessor measure
Abstract:

Behavioral Foundation Models (BFMs) proved successful in producing near-optimal policies for arbitrary tasks in a zero-shot manner, requiring no test-time retraining or task-specific fine-tuning. Among the most promising BFMs are the ones that estimate the successor measure learned in an unsupervised way from task-agnostic offline data. However, these methods fail to react to changes in the dynamics, making them inefficient under partial observability or when the transition function changes. This hinders the applicability of BFMs in a real-world setting, e.g., in robotics, where the dynamics can unexpectedly change at test time. In this work, we demonstrate that Forward–Backward (FB) representation, one of the methods from the BFM family, cannot produce reasonable policies under distinct dynamics, leading to an interference among the latent policy representations. To address this, we propose an FB model with a transformer-based belief estimator, which greatly facilitates zero-shot adaptation. Additionally, we show that partitioning the policy encoding space into dynamics-specific clusters, aligned with the context-embedding directions, yields additional gain in performance. Those traits allow our method to respond to the dynamics mismatches observed during training and to generalize to unseen ones. Empirically, in the changing dynamics setting, our approach achieves up to a 2x higher zero-shot returns compared to the baselines for both discrete and continuous tasks.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper extends Forward–Backward (FB) representation, a behavioral foundation model using successor measures, to handle dynamics variation at test time. It sits in the 'Successor Measure and Behavioral Foundation Models' leaf under Reinforcement Learning Approaches, which contains only two papers total. This is a notably sparse research direction within the broader taxonomy of fifty papers, suggesting that successor-measure-based BFMs remain an emerging area. The work addresses a recognized limitation—FB models' inability to adapt when transition functions change—by introducing belief estimation and policy space partitioning mechanisms.

The taxonomy reveals that neighboring leaves pursue alternative strategies for zero-shot dynamics adaptation. Language-Grounded Policy Learning (three papers) conditions on natural language to handle task variation, while Meta-Learning and Online Adaptation (two papers) emphasizes rapid test-time adjustment. World Models and Latent Dynamics (three papers) learns explicit forward models rather than implicit successor representations. The paper's approach diverges from these by retaining the successor measure framework while augmenting it with transformer-based belief tracking, positioning it at the intersection of behavioral priors and partial observability handling.

Among twenty candidates examined across three contributions, none were flagged as clearly refuting the proposed methods. The first contribution (demonstrating FB's limitation under dynamics shifts) examined ten candidates with zero refutations, as did the second (Belief-FB with transformer estimator). The third contribution (Rotation-FB for policy partitioning) had no candidates examined. This suggests that within the limited search scope—focused on top semantic matches and citations—the specific combination of successor measures, belief estimation, and policy space clustering appears relatively unexplored. However, the small candidate pool means the analysis cannot rule out relevant prior work outside the examined set.

The assessment reflects a targeted literature search rather than exhaustive coverage. The sparse population of the taxonomy leaf and absence of refutations among examined candidates indicate the work occupies a less-crowded niche within zero-shot RL. The transformer-based belief mechanism and rotation-based partitioning appear to be novel extensions of the FB framework, though the limited scope—twenty candidates across a fifty-paper taxonomy—leaves open the possibility of overlooked connections in adjacent research areas such as meta-learning or world models.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
20
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: zero-shot adaptation to unseen environment dynamics. The field addresses how agents and models can generalize to novel environmental conditions without additional training or fine-tuning. The taxonomy reveals several major branches: Reinforcement Learning Approaches explore policy and value function methods that leverage behavioral priors or successor representations; Robotics and Embodied AI tackle physical manipulation and navigation under shifting dynamics; Vision-Language Models and Cross-Modal Learning harness semantic grounding to bridge perceptual gaps; Domain Adaptation and Transfer Learning develop techniques for distributional shift; Dynamical Systems Modeling and Prediction focus on learning transferable forward models; Specialized Application Domains apply zero-shot principles to areas like climate forecasting or wireless networks; and Language Model Generalization examines prompt-based task adaptation. Representative works such as RTFM[2] and BC-Z[16] illustrate how different branches integrate prior knowledge, while Zero-Shot Generalization Survey[1] provides a broad overview of the landscape. Particularly active lines of work contrast model-free reinforcement learning with model-based dynamics prediction, and explore whether behavioral foundation models can capture reusable skills across environments. Zero-Shot Behavioral Adaptation[0] sits within the Successor Measure and Behavioral Foundation Models cluster under Reinforcement Learning Approaches, emphasizing the construction of reusable behavioral primitives that generalize without environment-specific retraining. This contrasts with works like RTFM Dynamics[3] and Emergent Complexity Transfer[4], which focus more on explicit dynamics modeling or curriculum-based transfer, and with Neuralizer[5] and Genie[6], which leverage generative world models. The central trade-off revolves around whether to encode environment knowledge implicitly in policies or explicitly in forward models, and how to balance sample efficiency with broad generalization. Open questions include scaling behavioral representations to diverse task families and determining when zero-shot methods can match or exceed few-shot adaptation strategies.

Claimed Contributions

Demonstration of FB limitation under dynamics variation

The authors show theoretically and empirically that Forward-Backward representations fail to adapt to changes in environment dynamics because they average over all possible future states, causing interference in the policy representation space and preventing effective generalization to new or unseen dynamics.

10 retrieved papers
Belief-FB method with transformer-based belief estimator

The authors introduce Belief-FB, a method that uses a permutation-invariant transformer encoder to estimate a belief state over the current environment dynamics in a self-supervised manner, conditioning the forward representation on this inferred context to enable zero-shot adaptation across different dynamics.

10 retrieved papers
Rotation-FB extension for policy space partitioning

The authors propose Rotation-FB, which partitions the policy encoding space into dynamics-specific clusters by sampling task vectors from a von Mises-Fisher distribution centered on inferred context directions, aligning policy representations with environment-specific features to further reduce interference and improve adaptation.

0 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Demonstration of FB limitation under dynamics variation

The authors show theoretically and empirically that Forward-Backward representations fail to adapt to changes in environment dynamics because they average over all possible future states, causing interference in the policy representation space and preventing effective generalization to new or unseen dynamics.

Contribution

Belief-FB method with transformer-based belief estimator

The authors introduce Belief-FB, a method that uses a permutation-invariant transformer encoder to estimate a belief state over the current environment dynamics in a self-supervised manner, conditioning the forward representation on this inferred context to enable zero-shot adaptation across different dynamics.

Contribution

Rotation-FB extension for policy space partitioning

The authors propose Rotation-FB, which partitions the policy encoding space into dynamics-specific clusters by sampling task vectors from a von Mises-Fisher distribution centered on inferred context directions, aligning policy representations with environment-specific features to further reduce interference and improve adaptation.