Diverse and Sparse Mixture-of-Experts for Causal Subgraph–Based Out-of-Distribution Graph Learning

ICLR 2026 Conference SubmissionAnonymous Authors
graph neural networkout-of-distribution learningmixture-of-experts
Abstract:

Current state-of-the-art methods for out-of-distribution (OOD) generalization lack the ability to effectively address datasets with heterogeneous causal subgraphs at the instance level. Existing approaches that attempt to handle such heterogeneity either rely on data augmentation, which risks altering label semantics, or impose causal assumptions whose validity in real-world datasets is uncertain. We introduce a novel Mixture-of-Experts (MoE) framework that can model heterogeneous causal subgraphs without relying on restrictive assumptions. Our key idea is to address instance-level heterogeneity by enforcing semantic diversity among experts, each generating a distinct causal subgraph, while a learned gate assigns sparse weights that adaptively focus on the most relevant experts for each input. Our theoretical analysis shows that these two properties jointly reduce OOD error. In practice, our experts are scalable and do not require environment labels. Empirically, our framework achieves strong performance on the GOOD benchmark across both synthetic and real-world structural shifts.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
23
Contribution Candidate Papers Compared
1
Refutable Paper

Research Landscape Overview

Core task: out-of-distribution generalization on graphs with heterogeneous causal subgraphs. The field addresses how graph neural networks can reliably generalize when test distributions differ from training, particularly when causal mechanisms vary across subgraphs. The taxonomy reveals several complementary research directions. Causal Subgraph Extraction and Invariance Learning focuses on identifying and leveraging stable causal patterns, with works like Causal Attention Learning[2] and Debiasing Causal Substructure[10] extracting invariant substructures. Environment-Based Causal Modeling emphasizes partitioning data into environments to learn invariances, as seen in Joint Label Environment[17] and Soft Causal Environment[41]. Structural Causal Models and Theoretical Frameworks provide foundational principles, while Invariance Principles and Unified Learning Frameworks, exemplified by Unified Invariant Learning[32], seek to consolidate these ideas. Domain-Specific Applications tailor methods to contexts like molecular graphs (Shift-Robust Molecular Learning[11]) or brain networks (BrainOOD[4]), and Meta-Learning approaches such as Meta-learning Domain Generalization[24] adapt architectures dynamically. Explainability branches like Reinforced Causal Explainer[36] ensure interpretability, while Auxiliary Techniques incorporate complementary strategies such as data augmentation or contrastive learning. A particularly active line of work centers on aggregating multiple causal subgraphs to capture diverse invariant patterns, addressing the challenge that a single causal mechanism may be insufficient for complex graphs. Diverse Sparse MoE[0] sits within this cluster, emphasizing diversity in causal subgraph discovery through mixture-of-experts architectures. This contrasts with nearby works like Subgraph Aggregation OOD[8], which aggregates multiple subgraphs but may not explicitly enforce diversity, and PISA[40], which focuses on prototype-based invariant subgraph aggregation. The trade-off involves balancing the richness of multiple causal views against computational complexity and the risk of capturing spurious correlations. Open questions include how to optimally combine heterogeneous causal signals and whether diversity mechanisms genuinely improve robustness or introduce redundancy. Diverse Sparse MoE[0] contributes by proposing sparsity-driven expert selection to maintain diversity while controlling model complexity, positioning it as a methodological refinement within the broader effort to handle causal heterogeneity in graph OOD generalization.

Claimed Contributions

Theoretical justification for MoE in graph OOD learning via risk bound decomposition

The authors derive a formal OOD risk bound that decomposes error into coverage and selection terms, proving that semantic diversity among experts (ensuring coverage of causal mechanisms) and instance-level sparsity in gating (enabling correct expert selection) together reduce out-of-distribution generalization error.

10 retrieved papers
Causal subgraph-based MoE framework without environment labels or strong causal assumptions

The authors propose a practical MoE architecture where experts extract diverse causal subgraphs using a decorrelation regularizer and a learned gating network adaptively selects relevant experts, avoiding the need for environment labels or restrictive causal independence assumptions common in prior work.

3 retrieved papers
Can Refute
Novel Mixture-of-Experts framework for modeling heterogeneous causal subgraphs at instance level

The authors introduce a MoE framework specifically designed to handle instance-level heterogeneity in causal subgraphs by allowing multiple experts to generate distinct causal hypotheses, with sparse gating adaptively focusing on the most relevant experts for each input graph.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Theoretical justification for MoE in graph OOD learning via risk bound decomposition

The authors derive a formal OOD risk bound that decomposes error into coverage and selection terms, proving that semantic diversity among experts (ensuring coverage of causal mechanisms) and instance-level sparsity in gating (enabling correct expert selection) together reduce out-of-distribution generalization error.

Contribution

Causal subgraph-based MoE framework without environment labels or strong causal assumptions

The authors propose a practical MoE architecture where experts extract diverse causal subgraphs using a decorrelation regularizer and a learned gating network adaptively selects relevant experts, avoiding the need for environment labels or restrictive causal independence assumptions common in prior work.

Contribution

Novel Mixture-of-Experts framework for modeling heterogeneous causal subgraphs at instance level

The authors introduce a MoE framework specifically designed to handle instance-level heterogeneity in causal subgraphs by allowing multiple experts to generate distinct causal hypotheses, with sparse gating adaptively focusing on the most relevant experts for each input graph.