Abstracting Robot Manipulation Skills via Mixture-of-Experts Diffusion Policies

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.5 Download Report PDF

Imitation LearningMixture of Experts

Diffusion-based policies have recently shown strong results in robot manipulation, but their extension to multi-task scenarios is hindered by the high cost of scaling model size and demonstrations. We introduce Skill Mixture-of-Experts Policy (SMP), a diffusion-based mixture-of-experts policy that learns a compact orthogonal skill basis and uses sticky routing to compose actions from a small, task-relevant subset of experts at each step. A variational training objective supports this design, and adaptive expert activation at inference yields fast sampling without oversized backbones. We validate SMP in simulation and on a real dual-arm platform with multi-task learning and transfer learning tasks, where SMP achieves higher success rates and markedly lower inference cost than large diffusion baselines. These results indicate a practical path toward scalable, transferable multi-task manipulation: learn reusable skills once, activate only what is needed, and adapt quickly when tasks change.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces Skill Mixture-of-Experts Policy (SMP), a diffusion-based MoE architecture that learns orthogonal skill bases and uses sticky routing to compose actions from task-relevant expert subsets. It resides in the 'Skill-Based MoE Diffusion Policies' leaf, which contains only two papers total (including this one). This leaf sits within the broader 'Mixture-of-Experts Integration in Diffusion Policies' branch, indicating a relatively sparse but active research direction focused on embedding MoE structures directly into diffusion policy frameworks for multi-task manipulation.

The taxonomy reveals several neighboring approaches: 'Denoiser-Level MoE' applies MoE to denoising transformers rather than skill decomposition, 'Language-Conditioned MoE' uses language instructions for routing, and 'Sparse Diffusion Policies' achieves efficiency through pruning rather than explicit expert specialization. Adjacent branches explore 'Distillation Methods' and 'Flow-Matching Alternatives', while more distant nodes address dexterous manipulation, locomotion, and vision-language-action models. The paper's focus on orthogonal skill bases and sticky routing distinguishes it from these related but structurally different approaches to multi-task learning.

Among fifteen candidates examined across three contributions, no clearly refutable prior work was identified. The core SMP architecture examined three candidates with zero refutations, the adaptive expert activation strategy examined ten candidates with zero refutations, and the variational training objective with sticky routing examined two candidates with zero refutations. This suggests that within the limited search scope—primarily top-K semantic matches and citation expansion—the specific combination of orthogonal skill learning, sticky routing, and adaptive activation appears relatively unexplored, though the broader MoE-diffusion paradigm is established.

Based on the limited literature search, the work appears to occupy a distinctive position within skill-based MoE diffusion policies, particularly in its integration of sticky routing and adaptive activation. However, the analysis covers only fifteen candidates from semantic search, not an exhaustive survey of all multi-task manipulation or MoE literature. The sparse population of the immediate taxonomy leaf (two papers) and absence of refutable candidates suggest novelty in the specific technical approach, though broader claims would require more comprehensive coverage.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: multi-task robot manipulation with diffusion-based mixture-of-experts policies. The field combines two powerful paradigms—diffusion models for generating smooth, multimodal action distributions and mixture-of-experts (MoE) architectures for decomposing complex multi-task problems into specialized sub-policies. The taxonomy reflects a rich landscape organized around several complementary themes. One major branch focuses on diffusion policy architectures themselves, exploring how to integrate MoE gating and skill decomposition directly into the generative process (e.g., Sparse Diffusion Policy[3], MoE-DP Skill Decomposition[14]). Adjacent branches examine flow-matching alternatives (Variational Flow-Matching Policy[7]) and representation learning strategies (Spatially-Grounded Representations[8]) that provide the perceptual backbone for these policies. Other directions address dexterous manipulation (Dexterous Pre-Grasp Diffusion[5], UniDexFPM[17]), residual learning frameworks (Residual MoE Grasping[12]), and broader multi-task reinforcement learning with MoE (Attention MoE MTRL[10]). Additional branches cover locomotion (MoE Locomotion[9]), hybrid dynamical systems (Adaptive Diffusion Hybrid[16]), skill composition (Local MoE Skills[13]), and vision-language-action models (VLA Models Survey[11]), alongside survey literature (Diffusion Policy Survey[1], Diffusion Robotics Review[18]) that contextualizes these developments. Within this landscape, a particularly active line of work centers on skill-based MoE diffusion policies, where the goal is to learn a set of expert diffusion models that specialize in distinct manipulation primitives and combine them via learned gating mechanisms. MoE Diffusion Skills[0] sits squarely in this cluster, emphasizing the decomposition of multi-task manipulation into interpretable skill modules within a unified diffusion framework. This approach contrasts with methods like Sparse Diffusion Policy[3], which uses sparsity to prune unnecessary network capacity rather than explicitly modeling skill boundaries, and with MoE-DP Skill Decomposition[14], which also pursues skill-level factorization but may differ in how experts are trained or gated. A key open question across these works is how to balance the expressiveness of individual expert policies against the complexity of the gating network, and whether skill discovery should be supervised, emergent, or guided by auxiliary objectives. By integrating MoE structure directly into the diffusion denoising process, MoE Diffusion Skills[0] aims to achieve both high performance on diverse tasks and interpretable specialization, positioning it as a representative example of this skill-based MoE diffusion paradigm.

Claimed Contributions

Skill Mixture-of-Experts Policy (SMP)

3 retrieved papers

SMP is a diffusion-based mixture-of-experts framework that explicitly abstracts reusable manipulation skills via a state-dependent orthonormal action basis with sticky routing. This design improves performance across multiple tasks by learning disentangled, phase-consistent behaviors that can be reused and transferred.

3 retrieved papers

Adaptive expert activation strategy

10 retrieved papers

An inference-time mechanism that selects only a small, state-dependent subset of experts (via top-k or coverage selection) to activate at each step. This reduces active parameters and latency substantially while preserving policy quality.

10 retrieved papers

Variational training objective with sticky routing

2 retrieved papers

A principled variational lower-bound formulation that combines reconstruction in a whitened basis, gate regularization via sticky Dirichlet Markov dynamics, and router alignment. This objective enables stable training of the orthonormal skill basis and phase-consistent gating.

2 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[14] MoE-DP: An MoE-Enhanced Diffusion Policy for Robust Long-Horizon Robotic Manipulation with Skill Decomposition and Failure Recovery PDF

Baiye Cheng, Huang Suning, Tianhai Liang, Suning Huang, Zhang Fei-hong, Maanping Shao, Xu, Botian, Feihong Zhang, Xue, Zhengrong, Botian Xu, Huazhe, Zhengrong Xue, Huazhe Xu (2025)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Skill Mixture-of-Experts Policy (SMP)

[14] MoE-DP: An MoE-Enhanced Diffusion Policy for Robust Long-Horizon Robotic Manipulation with Skill Decomposition and Failure Recovery PDF

Cannot Refute

[19] Flexible Multitask Learning with Factorized Diffusion Policy PDF

Cannot Refute

[22] KungfuBot2: Learning Versatile Motion Skills for Humanoid Whole-Body Control PDF

Cannot Refute

Contribution

Adaptive expert activation strategy

[25] A survey on inference optimization techniques for mixture of experts models PDF

Cannot Refute

[26] ExpertFlow: Optimized Expert Activation and Token Allocation for Efficient Mixture-of-Experts Inference PDF

Cannot Refute

[27] Exploiting inter-layer expert affinity for accelerating mixture-of-experts model inference PDF

Cannot Refute

[28] LExI: Layer-Adaptive Active Experts for Efficient MoE Model Inference PDF

Cannot Refute

[29] AdaMV-MoE: Adaptive Multi-Task Vision Mixture-of-Experts PDF

Cannot Refute

[30] Moe-lpr: Multilingual extension of large language models through mixture-of-experts with language priors routing PDF

Cannot Refute

[31] ExpertFlow: Adaptive Expert Scheduling and Memory Coordination for Efficient MoE Inference PDF

Cannot Refute

[32] MC-MoE: Mixture Compressor for Mixture-of-Experts LLMs Gains More PDF

Cannot Refute

[33] Pre-gated moe: An algorithm-system co-design for fast and scalable mixture-of-expert inference PDF

Cannot Refute

[34] ROUTERRETRIEVER: Routing over a Mixture of Expert Embedding Models PDF

Cannot Refute

Contribution

Variational training objective with sticky routing

[23] A Hybrid Model With Bayesian Nonparametric Inference for RF Fingerprint Identification PDF

Cannot Refute

[24] The Seeded Codebook Cortex PDF

Cannot Refute

Abstracting Robot Manipulation Skills via Mixture-of-Experts Diffusion Policies

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[14] MoE-DP: An MoE-Enhanced Diffusion Policy for Robust Long-Horizon Robotic Manipulation with Skill Decomposition and Failure Recovery PDF

Contribution Analysis

Skill Mixture-of-Experts Policy (SMP)

[14] MoE-DP: An MoE-Enhanced Diffusion Policy for Robust Long-Horizon Robotic Manipulation with Skill Decomposition and Failure Recovery PDF

[19] Flexible Multitask Learning with Factorized Diffusion Policy PDF

[22] KungfuBot2: Learning Versatile Motion Skills for Humanoid Whole-Body Control PDF

Adaptive expert activation strategy

[25] A survey on inference optimization techniques for mixture of experts models PDF

[26] ExpertFlow: Optimized Expert Activation and Token Allocation for Efficient Mixture-of-Experts Inference PDF

[27] Exploiting inter-layer expert affinity for accelerating mixture-of-experts model inference PDF

[28] LExI: Layer-Adaptive Active Experts for Efficient MoE Model Inference PDF

[29] AdaMV-MoE: Adaptive Multi-Task Vision Mixture-of-Experts PDF

[30] Moe-lpr: Multilingual extension of large language models through mixture-of-experts with language priors routing PDF

[31] ExpertFlow: Adaptive Expert Scheduling and Memory Coordination for Efficient MoE Inference PDF

[32] MC-MoE: Mixture Compressor for Mixture-of-Experts LLMs Gains More PDF

[33] Pre-gated moe: An algorithm-system co-design for fast and scalable mixture-of-expert inference PDF

[34] ROUTERRETRIEVER: Routing over a Mixture of Expert Embedding Models PDF

Variational training objective with sticky routing

[23] A Hybrid Model With Bayesian Nonparametric Inference for RF Fingerprint Identification PDF

[24] The Seeded Codebook Cortex PDF

Table of Contents