Diverse and Sparse Mixture-of-Experts for Causal Subgraph–Based Out-of-Distribution Graph Learning

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 5.5 Download Report PDF

graph neural networkout-of-distribution learningmixture-of-experts

Current state-of-the-art methods for out-of-distribution (OOD) generalization lack the ability to effectively address datasets with heterogeneous causal subgraphs at the instance level. Existing approaches that attempt to handle such heterogeneity either rely on data augmentation, which risks altering label semantics, or impose causal assumptions whose validity in real-world datasets is uncertain. We introduce a novel Mixture-of-Experts (MoE) framework that can model heterogeneous causal subgraphs without relying on restrictive assumptions. Our key idea is to address instance-level heterogeneity by enforcing semantic diversity among experts, each generating a distinct causal subgraph, while a learned gate assigns sparse weights that adaptively focus on the most relevant experts for each input. Our theoretical analysis shows that these two properties jointly reduce OOD error. In practice, our experts are scalable and do not require environment labels. Empirically, our framework achieves strong performance on the GOOD benchmark across both synthetic and real-world structural shifts.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: out-of-distribution generalization on graphs with heterogeneous causal subgraphs. The field addresses how graph neural networks can reliably generalize when test distributions differ from training, particularly when causal mechanisms vary across subgraphs. The taxonomy reveals several complementary research directions. Causal Subgraph Extraction and Invariance Learning focuses on identifying and leveraging stable causal patterns, with works like Causal Attention Learning[2] and Debiasing Causal Substructure[10] extracting invariant substructures. Environment-Based Causal Modeling emphasizes partitioning data into environments to learn invariances, as seen in Joint Label Environment[17] and Soft Causal Environment[41]. Structural Causal Models and Theoretical Frameworks provide foundational principles, while Invariance Principles and Unified Learning Frameworks, exemplified by Unified Invariant Learning[32], seek to consolidate these ideas. Domain-Specific Applications tailor methods to contexts like molecular graphs (Shift-Robust Molecular Learning[11]) or brain networks (BrainOOD[4]), and Meta-Learning approaches such as Meta-learning Domain Generalization[24] adapt architectures dynamically. Explainability branches like Reinforced Causal Explainer[36] ensure interpretability, while Auxiliary Techniques incorporate complementary strategies such as data augmentation or contrastive learning. A particularly active line of work centers on aggregating multiple causal subgraphs to capture diverse invariant patterns, addressing the challenge that a single causal mechanism may be insufficient for complex graphs. Diverse Sparse MoE[0] sits within this cluster, emphasizing diversity in causal subgraph discovery through mixture-of-experts architectures. This contrasts with nearby works like Subgraph Aggregation OOD[8], which aggregates multiple subgraphs but may not explicitly enforce diversity, and PISA[40], which focuses on prototype-based invariant subgraph aggregation. The trade-off involves balancing the richness of multiple causal views against computational complexity and the risk of capturing spurious correlations. Open questions include how to optimally combine heterogeneous causal signals and whether diversity mechanisms genuinely improve robustness or introduce redundancy. Diverse Sparse MoE[0] contributes by proposing sparsity-driven expert selection to maintain diversity while controlling model complexity, positioning it as a methodological refinement within the broader effort to handle causal heterogeneity in graph OOD generalization.

Claimed Contributions

Theoretical justification for MoE in graph OOD learning via risk bound decomposition

10 retrieved papers

The authors derive a formal OOD risk bound that decomposes error into coverage and selection terms, proving that semantic diversity among experts (ensuring coverage of causal mechanisms) and instance-level sparsity in gating (enabling correct expert selection) together reduce out-of-distribution generalization error.

10 retrieved papers

Causal subgraph-based MoE framework without environment labels or strong causal assumptions

Can Refute

3 retrieved papers

The authors propose a practical MoE architecture where experts extract diverse causal subgraphs using a decorrelation regularizer and a learned gating network adaptively selects relevant experts, avoiding the need for environment labels or restrictive causal independence assumptions common in prior work.

3 retrieved papers

Can Refute

Novel Mixture-of-Experts framework for modeling heterogeneous causal subgraphs at instance level

10 retrieved papers

The authors introduce a MoE framework specifically designed to handle instance-level heterogeneity in causal subgraphs by allowing multiple experts to generate distinct causal hypotheses, with sparse gating adaptively focusing on the most relevant experts for each input graph.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[8] Subgraph Aggregation for Out-of-Distribution Generalization on Graphs PDF

Bowen Liu, Haoyang Li, Shuning Wang, Shuo Nie, Shanghang Zhang (2025)

[40] PISA: Prioritized Invariant Subgraph Aggregation PDF

Ali Ghasemi, Farooq Ahmad Wani, Maria Sofia Bucarelli, Fabrizio Silvestri (2025)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Theoretical justification for MoE in graph OOD learning via risk bound decomposition

[54] Topology-Informed Robust Optimization for Out-of-Distribution Generalization PDF

Cannot Refute

[55] Sparse mixture-of-experts are domain generalizable learners PDF

Cannot Refute

[56] Sharp Analysis of Out-of-Distribution Error for âImportance-Weightedâ Estimators in the Overparameterized Regime PDF

Cannot Refute

[57] Not eliminate but aggregate: Post-hoc control over mixture-of-experts to address shortcut shifts in natural language understanding PDF

Cannot Refute

[58] CBDMoE: Consistent-but-Diverse Mixture of Experts for Domain Generalization PDF

Cannot Refute

[59] CrossGAP: Unified Face Anti-Spoofing via Cross-Modal Global-Aware Prompting PDF

Cannot Refute

[60] Mixture Data for Training Cannot Ensure Out-of-distribution Generalization PDF

Cannot Refute

[61] Accuracy on the wrong line: On the pitfalls of noisy data for OOD generalisation PDF

Cannot Refute

[62] Bridging the Theoretical Bound and Deep Algorithms for Open Set Domain Adaptation PDF

Cannot Refute

[63] Hmoe: Hypernetwork-based mixture of experts for domain generalization PDF

Cannot Refute

Contribution

Causal subgraph-based MoE framework without environment labels or strong causal assumptions

[53] Distribution Shift Resilient GNN via Mixture of Aligned Experts PDF

Can Refute

[51] Learning latent causal graphs via mixture oracles PDF

Cannot Refute

[52] Graphing the Truth: Harnessing Causal Insights for Advanced Multimodal Fake News Detection PDF

Cannot Refute

Contribution

Novel Mixture-of-Experts framework for modeling heterogeneous causal subgraphs at instance level

[51] Learning latent causal graphs via mixture oracles PDF

Cannot Refute

[64] Efficient scaling of large language models with mixture of experts and 3D analog in-memory computing PDF

Cannot Refute

[65] Bayesian graphical modeling for heterogeneous causal effects PDF

Cannot Refute

[66] Adaptive expert ensembles for fault diagnosis: A graph causal framework addressing distributional shifts PDF

Cannot Refute

[67] Interventional causal discovery in a mixture of DAGs PDF

Cannot Refute

[68] Mixture of causal experts: A causal perspective to build dual-level mixture-of-experts models PDF

Cannot Refute

[69] Covariate dependent mixture of bayesian networks PDF

Cannot Refute

[70] Causal Discovery with Language Models as Imperfect Experts PDF

Cannot Refute

[71] CAM: Causality-driven Adaptive Sparsity and Hierarchical Memory for robust out-of-distribution learning in GNNs PDF

Cannot Refute

[72] Graph Mixture of Experts: Learning on Large-Scale Graphs with Explicit Diversity Modeling PDF

Cannot Refute

Diverse and Sparse Mixture-of-Experts for Causal Subgraph–Based Out-of-Distribution Graph Learning

Overview

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[8] Subgraph Aggregation for Out-of-Distribution Generalization on Graphs PDF

[40] PISA: Prioritized Invariant Subgraph Aggregation PDF

Contribution Analysis

Theoretical justification for MoE in graph OOD learning via risk bound decomposition

[54] Topology-Informed Robust Optimization for Out-of-Distribution Generalization PDF

[55] Sparse mixture-of-experts are domain generalizable learners PDF

[56] Sharp Analysis of Out-of-Distribution Error for âImportance-Weightedâ Estimators in the Overparameterized Regime PDF

[57] Not eliminate but aggregate: Post-hoc control over mixture-of-experts to address shortcut shifts in natural language understanding PDF

[58] CBDMoE: Consistent-but-Diverse Mixture of Experts for Domain Generalization PDF

[59] CrossGAP: Unified Face Anti-Spoofing via Cross-Modal Global-Aware Prompting PDF

[60] Mixture Data for Training Cannot Ensure Out-of-distribution Generalization PDF

[61] Accuracy on the wrong line: On the pitfalls of noisy data for OOD generalisation PDF

[62] Bridging the Theoretical Bound and Deep Algorithms for Open Set Domain Adaptation PDF

[63] Hmoe: Hypernetwork-based mixture of experts for domain generalization PDF

Causal subgraph-based MoE framework without environment labels or strong causal assumptions

[53] Distribution Shift Resilient GNN via Mixture of Aligned Experts PDF

[51] Learning latent causal graphs via mixture oracles PDF

[52] Graphing the Truth: Harnessing Causal Insights for Advanced Multimodal Fake News Detection PDF

Novel Mixture-of-Experts framework for modeling heterogeneous causal subgraphs at instance level

[51] Learning latent causal graphs via mixture oracles PDF

[64] Efficient scaling of large language models with mixture of experts and 3D analog in-memory computing PDF

[65] Bayesian graphical modeling for heterogeneous causal effects PDF

[66] Adaptive expert ensembles for fault diagnosis: A graph causal framework addressing distributional shifts PDF

[67] Interventional causal discovery in a mixture of DAGs PDF

[68] Mixture of causal experts: A causal perspective to build dual-level mixture-of-experts models PDF

[69] Covariate dependent mixture of bayesian networks PDF

[70] Causal Discovery with Language Models as Imperfect Experts PDF

[71] CAM: Causality-driven Adaptive Sparsity and Hierarchical Memory for robust out-of-distribution learning in GNNs PDF

[72] Graph Mixture of Experts: Learning on Large-Scale Graphs with Explicit Diversity Modeling PDF

Table of Contents

[56] Sharp Analysis of Out-of-Distribution Error for âImportance-Weightedâ Estimators in the Overparameterized Regime PDF