Abstracting Robot Manipulation Skills via Mixture-of-Experts Diffusion Policies

ICLR 2026 Conference SubmissionAnonymous Authors
Imitation LearningMixture of Experts
Abstract:

Diffusion-based policies have recently shown strong results in robot manipulation, but their extension to multi-task scenarios is hindered by the high cost of scaling model size and demonstrations. We introduce Skill Mixture-of-Experts Policy (SMP), a diffusion-based mixture-of-experts policy that learns a compact orthogonal skill basis and uses sticky routing to compose actions from a small, task-relevant subset of experts at each step. A variational training objective supports this design, and adaptive expert activation at inference yields fast sampling without oversized backbones. We validate SMP in simulation and on a real dual-arm platform with multi-task learning and transfer learning tasks, where SMP achieves higher success rates and markedly lower inference cost than large diffusion baselines. These results indicate a practical path toward scalable, transferable multi-task manipulation: learn reusable skills once, activate only what is needed, and adapt quickly when tasks change.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces Skill Mixture-of-Experts Policy (SMP), a diffusion-based MoE architecture that learns orthogonal skill bases and uses sticky routing to compose actions from task-relevant expert subsets. It resides in the 'Skill-Based MoE Diffusion Policies' leaf, which contains only two papers total (including this one). This leaf sits within the broader 'Mixture-of-Experts Integration in Diffusion Policies' branch, indicating a relatively sparse but active research direction focused on embedding MoE structures directly into diffusion policy frameworks for multi-task manipulation.

The taxonomy reveals several neighboring approaches: 'Denoiser-Level MoE' applies MoE to denoising transformers rather than skill decomposition, 'Language-Conditioned MoE' uses language instructions for routing, and 'Sparse Diffusion Policies' achieves efficiency through pruning rather than explicit expert specialization. Adjacent branches explore 'Distillation Methods' and 'Flow-Matching Alternatives', while more distant nodes address dexterous manipulation, locomotion, and vision-language-action models. The paper's focus on orthogonal skill bases and sticky routing distinguishes it from these related but structurally different approaches to multi-task learning.

Among fifteen candidates examined across three contributions, no clearly refutable prior work was identified. The core SMP architecture examined three candidates with zero refutations, the adaptive expert activation strategy examined ten candidates with zero refutations, and the variational training objective with sticky routing examined two candidates with zero refutations. This suggests that within the limited search scope—primarily top-K semantic matches and citation expansion—the specific combination of orthogonal skill learning, sticky routing, and adaptive activation appears relatively unexplored, though the broader MoE-diffusion paradigm is established.

Based on the limited literature search, the work appears to occupy a distinctive position within skill-based MoE diffusion policies, particularly in its integration of sticky routing and adaptive activation. However, the analysis covers only fifteen candidates from semantic search, not an exhaustive survey of all multi-task manipulation or MoE literature. The sparse population of the immediate taxonomy leaf (two papers) and absence of refutable candidates suggest novelty in the specific technical approach, though broader claims would require more comprehensive coverage.

Taxonomy

Core-task Taxonomy Papers
21
3
Claimed Contributions
15
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: multi-task robot manipulation with diffusion-based mixture-of-experts policies. The field combines two powerful paradigms—diffusion models for generating smooth, multimodal action distributions and mixture-of-experts (MoE) architectures for decomposing complex multi-task problems into specialized sub-policies. The taxonomy reflects a rich landscape organized around several complementary themes. One major branch focuses on diffusion policy architectures themselves, exploring how to integrate MoE gating and skill decomposition directly into the generative process (e.g., Sparse Diffusion Policy[3], MoE-DP Skill Decomposition[14]). Adjacent branches examine flow-matching alternatives (Variational Flow-Matching Policy[7]) and representation learning strategies (Spatially-Grounded Representations[8]) that provide the perceptual backbone for these policies. Other directions address dexterous manipulation (Dexterous Pre-Grasp Diffusion[5], UniDexFPM[17]), residual learning frameworks (Residual MoE Grasping[12]), and broader multi-task reinforcement learning with MoE (Attention MoE MTRL[10]). Additional branches cover locomotion (MoE Locomotion[9]), hybrid dynamical systems (Adaptive Diffusion Hybrid[16]), skill composition (Local MoE Skills[13]), and vision-language-action models (VLA Models Survey[11]), alongside survey literature (Diffusion Policy Survey[1], Diffusion Robotics Review[18]) that contextualizes these developments. Within this landscape, a particularly active line of work centers on skill-based MoE diffusion policies, where the goal is to learn a set of expert diffusion models that specialize in distinct manipulation primitives and combine them via learned gating mechanisms. MoE Diffusion Skills[0] sits squarely in this cluster, emphasizing the decomposition of multi-task manipulation into interpretable skill modules within a unified diffusion framework. This approach contrasts with methods like Sparse Diffusion Policy[3], which uses sparsity to prune unnecessary network capacity rather than explicitly modeling skill boundaries, and with MoE-DP Skill Decomposition[14], which also pursues skill-level factorization but may differ in how experts are trained or gated. A key open question across these works is how to balance the expressiveness of individual expert policies against the complexity of the gating network, and whether skill discovery should be supervised, emergent, or guided by auxiliary objectives. By integrating MoE structure directly into the diffusion denoising process, MoE Diffusion Skills[0] aims to achieve both high performance on diverse tasks and interpretable specialization, positioning it as a representative example of this skill-based MoE diffusion paradigm.

Claimed Contributions

Skill Mixture-of-Experts Policy (SMP)

SMP is a diffusion-based mixture-of-experts framework that explicitly abstracts reusable manipulation skills via a state-dependent orthonormal action basis with sticky routing. This design improves performance across multiple tasks by learning disentangled, phase-consistent behaviors that can be reused and transferred.

3 retrieved papers
Adaptive expert activation strategy

An inference-time mechanism that selects only a small, state-dependent subset of experts (via top-k or coverage selection) to activate at each step. This reduces active parameters and latency substantially while preserving policy quality.

10 retrieved papers
Variational training objective with sticky routing

A principled variational lower-bound formulation that combines reconstruction in a whitened basis, gate regularization via sticky Dirichlet Markov dynamics, and router alignment. This objective enables stable training of the orthonormal skill basis and phase-consistent gating.

2 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Skill Mixture-of-Experts Policy (SMP)

SMP is a diffusion-based mixture-of-experts framework that explicitly abstracts reusable manipulation skills via a state-dependent orthonormal action basis with sticky routing. This design improves performance across multiple tasks by learning disentangled, phase-consistent behaviors that can be reused and transferred.

Contribution

Adaptive expert activation strategy

An inference-time mechanism that selects only a small, state-dependent subset of experts (via top-k or coverage selection) to activate at each step. This reduces active parameters and latency substantially while preserving policy quality.

Contribution

Variational training objective with sticky routing

A principled variational lower-bound formulation that combines reconstruction in a whitened basis, gate regularization via sticky Dirichlet Markov dynamics, and router alignment. This objective enables stable training of the orthonormal skill basis and phase-consistent gating.