Expert Merging in Sparse Mixture of Experts with Nash Bargaining

ICLR 2026 Conference SubmissionAnonymous Authors
Mixture of ExpertsGame Theory
Abstract:

Existing expert merging strategies for Sparse Mixture of Experts (SMoE) typically rely on input-dependent or input-independent averaging of expert parameters, but often lack a principled weighting mechanism. In this work, we reinterpret expert merging through the lens of game theory, revealing cooperative and competitive dynamics among experts. Based on this perspective, we introduce Nash Merging of Experts (NAMEx), a novel framework that incorporates Nash Bargaining into the merging process, enabling more balanced and efficient collaboration among experts. Additionally, we incorporate complex momentum into NAMEx to accelerate expert propagation with theoretical guarantees for convergence. Extensive experiments across language modeling, text classification, image classification, and zero-shot robustness under data corruption show that NAMEx consistently outperforms competing methods while integrating seamlessly with popular MoE architectures. Finally, we demonstrate NAMEx’s scalability by applying it to large-scale systems, including Qwen1.5-MoE (14B) and DeepSeek-MoE (16B), where it proves effective in both zero-shot and fine-tuning settings.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces NAMEx, a game-theoretic framework for merging experts in Sparse Mixture of Experts models using Nash Bargaining principles. It occupies a newly defined taxonomy leaf labeled 'Game-Theoretic Expert Merging' under the broader 'Expert Merging and Compression Techniques' branch. Notably, this leaf contains only the original paper itself, with zero sibling papers, indicating that game-theoretic approaches to expert merging represent a sparse and relatively unexplored research direction within the field's current structure.

The taxonomy reveals that expert merging research is concentrated in adjacent leaves such as 'Retraining-Free Expert Merging' (four papers using clustering or similarity methods) and 'Training-Based Merging and Upcycling' (four papers on dense-to-MoE transformations). These neighboring directions rely on heuristic averaging, hierarchical clustering, or parameter initialization strategies. The game-theoretic framing diverges by introducing cooperative bargaining dynamics, positioning the work at the intersection of merging techniques and optimization theory rather than purely empirical consolidation methods.

Among thirty candidates examined through semantic search, the contribution-level analysis shows mixed novelty signals. The core NAMEx framework and complex momentum integration each examined ten candidates, with one refutable match per contribution, suggesting some overlap with prior optimization or merging work. The game-theoretic interpretation examined ten candidates with zero refutable matches, indicating this conceptual lens appears more distinctive within the limited search scope. The statistics reflect a focused but not exhaustive literature review, leaving open the possibility of additional relevant work beyond the top-thirty semantic matches.

Given the sparse taxonomy leaf and limited search scope, the game-theoretic framing appears relatively novel, while the technical components show modest prior overlap. The analysis covers top-ranked semantic neighbors but does not claim comprehensive field coverage, particularly for optimization techniques in adjacent domains that might employ similar bargaining or momentum strategies outside the MoE context.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
30
Contribution Candidate Papers Compared
2
Refutable Paper

Research Landscape Overview

Core task: expert merging in sparse mixture of experts models. The field has evolved around several interconnected challenges: how to combine or compress experts without sacrificing performance, how to route tokens efficiently to the right experts, how to train and optimize these systems at scale, and how to deploy them in resource-constrained or domain-specific settings. The taxonomy reflects this structure through branches such as Expert Merging and Compression Techniques, which explores methods like hierarchical clustering (Hierarchical Clustering MoE[1], Hierarchical Clustering Merging[5]) and parameter upcycling (Upcycling Parameter Merging[11]); Routing Mechanisms and Expert Selection, which addresses dynamic gating strategies (Expert Choice Routing[14], Omni Router[7]); Training Paradigms and Optimization, covering dense-to-sparse training pipelines (Dense Training Sparse Inference[8]); and Domain-Specific MoE Applications, which tailors architectures to particular tasks (Action Specialized MoE[4]). Additional branches examine theoretical foundations, parameter-efficient designs, security concerns like intellectual property protection (RouteMark[18]), and computational efficiency for deployment. A particularly active line of work focuses on reducing redundancy and improving expert utilization through merging and pruning strategies. Some approaches leverage clustering or manifold-based techniques (Stratified Manifold MoE[9], MergeME[17]) to consolidate similar experts, while others explore game-theoretic frameworks to balance competing objectives during merging. Nash Bargaining Expert Merging[0] sits within this game-theoretic subfield, offering a principled negotiation mechanism for combining experts that contrasts with simpler averaging or clustering heuristics seen in works like Hierarchical Clustering Merging[5]. Meanwhile, routing innovations such as Expert Race[3] and Layerwise Recurrent Router[2] emphasize dynamic, context-aware selection, highlighting a trade-off between merging fewer, more general experts versus maintaining many specialized ones with smarter routing. The original paper's emphasis on cooperative bargaining provides a middle ground, aiming to preserve expert diversity while achieving compression, a theme that resonates across efforts to balance model capacity, efficiency, and task performance.

Claimed Contributions

Nash Merging of Experts (NAMEx) framework

The authors propose NAMEx, a new expert merging method that reinterprets expert merging through game theory. It applies the Nash Bargaining Solution to determine merging coefficients based on each expert's contribution, treating expert domain vectors as utility functions in a cooperative-competitive game among experts.

10 retrieved papers
Can Refute
Complex momentum integration with theoretical convergence guarantees

The authors integrate complex momentum into the NAMEx framework to speed up expert propagation across layers. They provide theoretical analysis proving convergence under mild conditions and establish a spectral radius-based bound for the convergence rate of NAMEx-Momentum.

10 retrieved papers
Can Refute
Game-theoretic interpretation of expert merging dynamics

The authors provide a novel theoretical perspective by framing expert merging as a cooperative-competitive game among experts rather than simple parameter averaging. This game-theoretic lens reveals the intricate dynamics between experts and motivates the use of Nash Bargaining for principled weighting mechanisms.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Nash Merging of Experts (NAMEx) framework

The authors propose NAMEx, a new expert merging method that reinterprets expert merging through game theory. It applies the Nash Bargaining Solution to determine merging coefficients based on each expert's contribution, treating expert domain vectors as utility functions in a cooperative-competitive game among experts.

Contribution

Complex momentum integration with theoretical convergence guarantees

The authors integrate complex momentum into the NAMEx framework to speed up expert propagation across layers. They provide theoretical analysis proving convergence under mild conditions and establish a spectral radius-based bound for the convergence rate of NAMEx-Momentum.

Contribution

Game-theoretic interpretation of expert merging dynamics

The authors provide a novel theoretical perspective by framing expert merging as a cooperative-competitive game among experts rather than simple parameter averaging. This game-theoretic lens reveals the intricate dynamics between experts and motivates the use of Nash Bargaining for principled weighting mechanisms.