Task Tokens: A Flexible Approach to Adapting Behavior Foundation Models

ICLR 2026 Conference SubmissionAnonymous Authors
Reinforcement LearningHierarchial Reinforcement LearningBehavior Foundation ModelsHumanoid Control
Abstract:

Recent advancements in imitation learning for robotic control have led to transformer-based behavior foundation models (BFMs) that enable multi-modal, human-like control for humanoid agents. These models generate solutions when conditioned on high-level goals or prompts, for example, walking to a coordinate when conditioned on the position of the robot's pelvis. While excelling at zero-shot generation of robust behaviors, BFMs often require meticulous prompt engineering for specific tasks, potentially yielding suboptimal results. In this work, we introduce ``Task Tokens'' - a method to effectively tailor BFMs to specific tasks while preserving their flexibility. Our approach integrates naturally within the transformer architecture of BFMs. Task Tokens trains a task-specific encoder (tokenizer), with the original BFM remaining untouched. Our method reduces trainable parameters per task by up to ×125\times 125 and converges up to ×6\times 6 faster compared to standard baselines. In addition, by keeping the original BFM unchanged, Task Tokens enables utilizing the pre-existing encoders. This allows incorporating user-defined priors, balancing reward design and prompt engineering. We demonstrate Task Tokens' efficacy across various tasks, including out-of-distribution scenarios, and show their compatibility with other prompting modalities. Our results suggest that Task Tokens offer a promising approach for adapting BFMs to specific control tasks while retaining their generalization capabilities.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces Task Tokens, a method for adapting behavior foundation models (BFMs) to specific tasks by training a task-specific encoder while keeping the original BFM frozen. Within the taxonomy, this work resides in the Prompt-Based and Token-Based Adaptation leaf, which contains only three papers total. This is a relatively sparse research direction compared to broader branches like Domain-Specific Adaptation Applications or Full Model Fine-Tuning. The sibling papers in this leaf explore related prompt-based mechanisms, suggesting that token-based adaptation for behavioral control is an emerging but not yet crowded area.

The taxonomy reveals that Task Tokens sits at the intersection of Parameter-Efficient Adaptation Methods and Behavioral Foundation Models. Neighboring leaves include Memory-Efficient and Zeroth-Order Optimization (which addresses forward-only adaptation) and Humanoid and Robotic Control (which focuses on whole-body control architectures). The scope note for Prompt-Based Adaptation explicitly excludes methods that update model weights, positioning Task Tokens as a pure conditioning approach. This distinguishes it from full fine-tuning branches and aligns it with works that manipulate input representations rather than internal parameters.

Among the three contributions analyzed, the parameter-efficiency claim examined ten candidates and found six potentially refutable prior works, indicating substantial overlap with existing parameter-efficient methods in the broader literature. The core Task Tokens mechanism examined five candidates with zero refutations, suggesting greater novelty in the specific application to behavioral control. The hybrid control paradigm examined ten candidates with no refutations, though this may reflect the limited search scope (twenty-five total candidates) rather than definitive novelty. The analysis does not claim exhaustive coverage of all relevant prior work.

Based on the limited search scope, Task Tokens appears to occupy a relatively sparse niche within prompt-based adaptation for behavioral foundation models. The parameter-efficiency aspect shows more overlap with existing techniques, while the application to humanoid control and the hybrid control paradigm appear less explored. The analysis reflects top-K semantic matches and does not guarantee comprehensive coverage of all related work in robotics or transformer-based control.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
25
Contribution Candidate Papers Compared
6
Refutable Paper

Research Landscape Overview

Core task: adapting behavior foundation models to specific tasks. The field has organized itself around several major branches that reflect different strategic emphases. Parameter-Efficient Adaptation Methods explore lightweight techniques—such as prompt-based and token-based approaches (e.g., Task Tokens[0], Self-regulating Prompts[4])—that modify only a small subset of parameters or inject learnable tokens to steer pre-trained models toward new objectives. Full Model Fine-Tuning encompasses end-to-end retraining strategies, including works that align models with human preferences (Human Preferences Fine-tuning[19]) or address domain-specific constraints (Fault Diagnosis Fine-tuning[5]). Domain-Specific Adaptation Applications demonstrate how foundation models are tailored to specialized contexts—ranging from health coaching (Health Coaching LLMs[7], Physical Activity Coaching[17]) and pathology (Pathology Foundation Survey[24], Free Lunch Pathology[25]) to robotics (Robotics Foundation Models[36]) and animal behavior analysis (Animal Behavior Vision[12], Elephant Vocalization Transfer[18]). Transfer Learning and Generalization investigates how knowledge acquired in one setting generalizes to new environments (Transfer Learning Code[11], Zero-Shot Dynamics Adaptation[35]), while Behavioral Foundation Models and Computer Vision Foundation Models address the architectures and pre-training regimes that underpin these systems. Federated and Distributed Adaptation (Federated Foundation Adaptation[9]) and Foundation Model Vulnerabilities and Security (Model Stealing Threats[16], Fine-tuning Compromises Safety[8]) round out the taxonomy by considering deployment constraints and adversarial risks. Across these branches, a recurring tension emerges between efficiency and expressiveness: parameter-efficient methods promise rapid, low-cost adaptation but may sacrifice task-specific performance, whereas full fine-tuning can achieve stronger alignment at the expense of computational overhead and potential safety degradation (Fine-tuning Compromises Safety[8]). Task Tokens[0] sits squarely within the Prompt-Based and Token-Based Adaptation cluster, proposing a mechanism to inject task-specific information without retraining the entire backbone—an approach closely related to Self-regulating Prompts[4] and Personalized Sequential Prompt[28], which similarly manipulate input representations to guide model behavior. Compared to Forward Pass Fine-tuning[3], which modifies activations during inference, Task Tokens[0] emphasizes learnable token embeddings that can be optimized offline and then deployed with minimal runtime cost. This positioning highlights an active line of inquiry: how to balance the modularity and scalability of prompt-based methods with the need for task-specific expressiveness, a question that also motivates recent work on parameter-efficient fine-tuning (Parameter-efficient Fine-tuning[29]) and domain-specific applications (Adapting LLMs Downstream[6]).

Claimed Contributions

Task Tokens method for adapting behavior foundation models

The authors propose Task Tokens, a novel approach that trains a task-specific encoder (tokenizer) to generate specialized token representations for each new task, while keeping the original behavior foundation model frozen. This enables task-specific adaptation without fine-tuning the entire foundation model, preserving its zero-shot capabilities and generalization.

5 retrieved papers
Parameter-efficient and fast-converging adaptation approach

The method achieves significant efficiency gains by requiring only approximately 200K trainable parameters per task (compared to millions in baseline methods) and demonstrates faster convergence during training. This makes the approach highly scalable for adapting foundation models to multiple downstream tasks.

10 retrieved papers
Can Refute
Hybrid control paradigm combining user-defined priors and learned optimization

The approach establishes a hybrid control framework where users can provide high-level behavioral priors via goals (such as walk toward object while facing forward), which are then enhanced by task-specific embeddings learned through reinforcement learning to optimize dense rewards. This integration leverages the tokenization framework of goal-conditioned behavior foundation models.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Task Tokens method for adapting behavior foundation models

The authors propose Task Tokens, a novel approach that trains a task-specific encoder (tokenizer) to generate specialized token representations for each new task, while keeping the original behavior foundation model frozen. This enables task-specific adaptation without fine-tuning the entire foundation model, preserving its zero-shot capabilities and generalization.

Contribution

Parameter-efficient and fast-converging adaptation approach

The method achieves significant efficiency gains by requiring only approximately 200K trainable parameters per task (compared to millions in baseline methods) and demonstrates faster convergence during training. This makes the approach highly scalable for adapting foundation models to multiple downstream tasks.

Contribution

Hybrid control paradigm combining user-defined priors and learned optimization

The approach establishes a hybrid control framework where users can provide high-level behavioral priors via goals (such as walk toward object while facing forward), which are then enhanced by task-specific embeddings learned through reinforcement learning to optimize dense rewards. This integration leverages the tokenization framework of goal-conditioned behavior foundation models.