Expert Merging in Sparse Mixture of Experts with Nash Bargaining

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.8 Download Report PDF

Mixture of ExpertsGame Theory

Existing expert merging strategies for Sparse Mixture of Experts (SMoE) typically rely on input-dependent or input-independent averaging of expert parameters, but often lack a principled weighting mechanism. In this work, we reinterpret expert merging through the lens of game theory, revealing cooperative and competitive dynamics among experts. Based on this perspective, we introduce Nash Merging of Experts (NAMEx), a novel framework that incorporates Nash Bargaining into the merging process, enabling more balanced and efficient collaboration among experts. Additionally, we incorporate complex momentum into NAMEx to accelerate expert propagation with theoretical guarantees for convergence. Extensive experiments across language modeling, text classification, image classification, and zero-shot robustness under data corruption show that NAMEx consistently outperforms competing methods while integrating seamlessly with popular MoE architectures. Finally, we demonstrate NAMEx’s scalability by applying it to large-scale systems, including Qwen1.5-MoE (14B) and DeepSeek-MoE (16B), where it proves effective in both zero-shot and fine-tuning settings.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces NAMEx, a game-theoretic framework for merging experts in Sparse Mixture of Experts models using Nash Bargaining principles. It occupies a newly defined taxonomy leaf labeled 'Game-Theoretic Expert Merging' under the broader 'Expert Merging and Compression Techniques' branch. Notably, this leaf contains only the original paper itself, with zero sibling papers, indicating that game-theoretic approaches to expert merging represent a sparse and relatively unexplored research direction within the field's current structure.

The taxonomy reveals that expert merging research is concentrated in adjacent leaves such as 'Retraining-Free Expert Merging' (four papers using clustering or similarity methods) and 'Training-Based Merging and Upcycling' (four papers on dense-to-MoE transformations). These neighboring directions rely on heuristic averaging, hierarchical clustering, or parameter initialization strategies. The game-theoretic framing diverges by introducing cooperative bargaining dynamics, positioning the work at the intersection of merging techniques and optimization theory rather than purely empirical consolidation methods.

Among thirty candidates examined through semantic search, the contribution-level analysis shows mixed novelty signals. The core NAMEx framework and complex momentum integration each examined ten candidates, with one refutable match per contribution, suggesting some overlap with prior optimization or merging work. The game-theoretic interpretation examined ten candidates with zero refutable matches, indicating this conceptual lens appears more distinctive within the limited search scope. The statistics reflect a focused but not exhaustive literature review, leaving open the possibility of additional relevant work beyond the top-thirty semantic matches.

Given the sparse taxonomy leaf and limited search scope, the game-theoretic framing appears relatively novel, while the technical components show modest prior overlap. The analysis covers top-ranked semantic neighbors but does not claim comprehensive field coverage, particularly for optimization techniques in adjacent domains that might employ similar bargaining or momentum strategies outside the MoE context.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: expert merging in sparse mixture of experts models. The field has evolved around several interconnected challenges: how to combine or compress experts without sacrificing performance, how to route tokens efficiently to the right experts, how to train and optimize these systems at scale, and how to deploy them in resource-constrained or domain-specific settings. The taxonomy reflects this structure through branches such as Expert Merging and Compression Techniques, which explores methods like hierarchical clustering (Hierarchical Clustering MoE[1], Hierarchical Clustering Merging[5]) and parameter upcycling (Upcycling Parameter Merging[11]); Routing Mechanisms and Expert Selection, which addresses dynamic gating strategies (Expert Choice Routing[14], Omni Router[7]); Training Paradigms and Optimization, covering dense-to-sparse training pipelines (Dense Training Sparse Inference[8]); and Domain-Specific MoE Applications, which tailors architectures to particular tasks (Action Specialized MoE[4]). Additional branches examine theoretical foundations, parameter-efficient designs, security concerns like intellectual property protection (RouteMark[18]), and computational efficiency for deployment. A particularly active line of work focuses on reducing redundancy and improving expert utilization through merging and pruning strategies. Some approaches leverage clustering or manifold-based techniques (Stratified Manifold MoE[9], MergeME[17]) to consolidate similar experts, while others explore game-theoretic frameworks to balance competing objectives during merging. Nash Bargaining Expert Merging[0] sits within this game-theoretic subfield, offering a principled negotiation mechanism for combining experts that contrasts with simpler averaging or clustering heuristics seen in works like Hierarchical Clustering Merging[5]. Meanwhile, routing innovations such as Expert Race[3] and Layerwise Recurrent Router[2] emphasize dynamic, context-aware selection, highlighting a trade-off between merging fewer, more general experts versus maintaining many specialized ones with smarter routing. The original paper's emphasis on cooperative bargaining provides a middle ground, aiming to preserve expert diversity while achieving compression, a theme that resonates across efforts to balance model capacity, efficiency, and task performance.

Claimed Contributions

Nash Merging of Experts (NAMEx) framework

Can Refute

10 retrieved papers

The authors propose NAMEx, a new expert merging method that reinterprets expert merging through game theory. It applies the Nash Bargaining Solution to determine merging coefficients based on each expert's contribution, treating expert domain vectors as utility functions in a cooperative-competitive game among experts.

10 retrieved papers

Can Refute

Complex momentum integration with theoretical convergence guarantees

Can Refute

10 retrieved papers

The authors integrate complex momentum into the NAMEx framework to speed up expert propagation across layers. They provide theoretical analysis proving convergence under mild conditions and establish a spectral radius-based bound for the convergence rate of NAMEx-Momentum.

10 retrieved papers

Can Refute

Game-theoretic interpretation of expert merging dynamics

10 retrieved papers

The authors provide a novel theoretical perspective by framing expert merging as a cooperative-competitive game among experts rather than simple parameter averaging. This game-theoretic lens reveals the intricate dynamics between experts and motivates the use of Nash Bargaining for principled weighting mechanisms.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Nash Merging of Experts (NAMEx) framework

[61] Multi-Task Learning as a Bargaining Game PDF

Can Refute

[62] Fair multiuser channel allocation for OFDMA networks using Nash bargaining solutions and coalitions PDF

Cannot Refute

[63] Cooperative P2P Energy Trading in Active Distribution Networks: An MILP-Based Nash Bargaining Solution PDF

Cannot Refute

[64] Multi-Step Clustering and Generalized Nash Bargaining-Based Planning Strategy of Community-Shared Energy Storage for Large-Scale Prosumers PDF

Cannot Refute

[65] Nash bargaining based integrated energy agent optimal operation strategy considering negotiation pricing for tradable green certificate PDF

Cannot Refute

[66] Balancing Results from AI-Based Geostatistics versus Fuzzy Inference by Game Theory Bargaining to Improve a Groundwater Monitoring Network PDF

Cannot Refute

[67] Coordination of Multi-Agent Orderly Charging via an Incentive-Compatible Mechanism PDF

Cannot Refute

[68] Incentivizing the Collaboration Between Travelers and Power-Traffic Network Operators: An Asymmetric Nash Bargaining Approach PDF

Cannot Refute

[69] Economic Analysis of Cognitive Underlay Networks: A Nash Bargaining Based Approach PDF

Cannot Refute

[70] Distributed Cooperative Optimal Operation of Multiple Virtual Power Plants Based on Multi-Stage Robust Optimization PDF

Cannot Refute

Contribution

Complex momentum integration with theoretical convergence guarantees

[51] MomentumSMoe: Integrating momentum into sparse mixture of experts PDF

Can Refute

[52] State of charge estimation for lithium battery based on Levenberg-marquardt back-propagation neural network with momentum term PDF

Cannot Refute

[53] Acceleration of gradient-based path integral method for efficient optimal and inverse optimal control PDF

Cannot Refute

[54] A generalized and fast-converging non-negative latent factor model for predicting user preferences in recommender systems PDF

Cannot Refute

[55] Convergence of Momentum-based Distributed Stochastic Approximation with RL Applications PDF

Cannot Refute

[56] On the global convergence of momentum-based policy gradient PDF

Cannot Refute

[57] Convergence Analysis of Multilayer BP Neural Network with Momentum Term PDF

Cannot Refute

[58] Faster Adaptive Momentum-Based Federated Methods for Distributed Composition Optimization PDF

Cannot Refute

[59] Momentum Survey Propagation: A Statistical Physics Approach to Resource Allocation in mMTC PDF

Cannot Refute

[60] The effect of adaptive gain and adaptive momentum in improving training time of gradient descent back propagation algorithm on classification problems PDF

Cannot Refute

Contribution

Game-theoretic interpretation of expert merging dynamics

[71] Evolutionary Game Theory in Energy Storage Systems: A Systematic Review of Collaborative Decision-Making, Operational Strategies, and Coordination Mechanisms â¦ PDF

Cannot Refute

[72] Game-theoretic expert importance evaluation model guided by cooperation effects for social network group decision making PDF

Cannot Refute

[73] Cooperative game theoretic framework for joint resource management in construction PDF

Cannot Refute

[74] Evidence combination from an evolutionary game theory perspective PDF

Cannot Refute

[75] A Game-Theoretic Framework for Explaining Professional Boundary Dynamics PDF

Cannot Refute

[76] A Cooperative Merging Strategy for Connected and Automated Vehicles Based on Game Theory With Transferable Utility PDF

Cannot Refute

[77] An approach for overlapping and hierarchical community detection in social networks based on coalition formation game theory PDF

Cannot Refute

[78] A cooperative game-theoretic framework for negotiating marine spatial allocation agreements among heterogeneous players PDF

Cannot Refute

[79] Game theoretic mixed experts for combinational adversarial machine learning PDF

Cannot Refute

[80] Knowledge Sharing Framework: a Game-Theoretic Approach PDF

Cannot Refute

Expert Merging in Sparse Mixture of Experts with Nash Bargaining

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

Contribution Analysis

Nash Merging of Experts (NAMEx) framework

[61] Multi-Task Learning as a Bargaining Game PDF

[62] Fair multiuser channel allocation for OFDMA networks using Nash bargaining solutions and coalitions PDF

[63] Cooperative P2P Energy Trading in Active Distribution Networks: An MILP-Based Nash Bargaining Solution PDF

[64] Multi-Step Clustering and Generalized Nash Bargaining-Based Planning Strategy of Community-Shared Energy Storage for Large-Scale Prosumers PDF

[65] Nash bargaining based integrated energy agent optimal operation strategy considering negotiation pricing for tradable green certificate PDF

[66] Balancing Results from AI-Based Geostatistics versus Fuzzy Inference by Game Theory Bargaining to Improve a Groundwater Monitoring Network PDF

[67] Coordination of Multi-Agent Orderly Charging via an Incentive-Compatible Mechanism PDF

[68] Incentivizing the Collaboration Between Travelers and Power-Traffic Network Operators: An Asymmetric Nash Bargaining Approach PDF

[69] Economic Analysis of Cognitive Underlay Networks: A Nash Bargaining Based Approach PDF

[70] Distributed Cooperative Optimal Operation of Multiple Virtual Power Plants Based on Multi-Stage Robust Optimization PDF

Complex momentum integration with theoretical convergence guarantees

[51] MomentumSMoe: Integrating momentum into sparse mixture of experts PDF

[52] State of charge estimation for lithium battery based on Levenberg-marquardt back-propagation neural network with momentum term PDF

[53] Acceleration of gradient-based path integral method for efficient optimal and inverse optimal control PDF

[54] A generalized and fast-converging non-negative latent factor model for predicting user preferences in recommender systems PDF

[55] Convergence of Momentum-based Distributed Stochastic Approximation with RL Applications PDF

[56] On the global convergence of momentum-based policy gradient PDF

[57] Convergence Analysis of Multilayer BP Neural Network with Momentum Term PDF

[58] Faster Adaptive Momentum-Based Federated Methods for Distributed Composition Optimization PDF

[59] Momentum Survey Propagation: A Statistical Physics Approach to Resource Allocation in mMTC PDF

[60] The effect of adaptive gain and adaptive momentum in improving training time of gradient descent back propagation algorithm on classification problems PDF

Game-theoretic interpretation of expert merging dynamics

[71] Evolutionary Game Theory in Energy Storage Systems: A Systematic Review of Collaborative Decision-Making, Operational Strategies, and Coordination Mechanisms â¦ PDF

[72] Game-theoretic expert importance evaluation model guided by cooperation effects for social network group decision making PDF

[73] Cooperative game theoretic framework for joint resource management in construction PDF

[74] Evidence combination from an evolutionary game theory perspective PDF

[75] A Game-Theoretic Framework for Explaining Professional Boundary Dynamics PDF

[76] A Cooperative Merging Strategy for Connected and Automated Vehicles Based on Game Theory With Transferable Utility PDF

[77] An approach for overlapping and hierarchical community detection in social networks based on coalition formation game theory PDF

[78] A cooperative game-theoretic framework for negotiating marine spatial allocation agreements among heterogeneous players PDF

[79] Game theoretic mixed experts for combinational adversarial machine learning PDF

[80] Knowledge Sharing Framework: a Game-Theoretic Approach PDF

Table of Contents

[71] Evolutionary Game Theory in Energy Storage Systems: A Systematic Review of Collaborative Decision-Making, Operational Strategies, and Coordination Mechanisms â¦ PDF