Joint Distribution–Informed Shapley Values for Sparse Counterfactual Explanations

ICLR 2026 Conference SubmissionAnonymous Authors
Counterfactual ExplanationsShapley ValuesOptimizationExplainable Machine Learning
Abstract:

Counterfactual explanations (CE) aim to reveal how small input changes flip a model’s prediction, yet many methods modify more features than necessary, reducing clarity and actionability. We introduce COLA, a model- and generator-agnostic post-hoc framework that refines any given CE by computing a coupling via optimal transport (OT) between factual and counterfactual sets and using it to drive a Shapley-based attribution p-SHAP that selects a minimal set of edits while preserving the target effect. Theoretically, OT minimizes an upper bound on the W1W_1 divergence between factual and counterfactual outcomes and that, under mild conditions, refined counterfactuals are guaranteed not to move farther from the factuals than the originals. Empirically, across four datasets, twelve models, and five CE generators, COLA achieves the same target effects with only 26–45% of the original feature edits. On a small-scale benchmark, COLA shows near-optimality.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces COLA, a post-hoc refinement framework that uses optimal transport coupling and Shapley-based attribution to reduce feature edits in counterfactual explanations. Within the taxonomy, it resides in the 'Optimal Transport and Coupling-Based Refinement' leaf under 'Optimization-Based Counterfactual Generation Methods'. Notably, this leaf contains only the original paper itself—no sibling papers are present. This isolation suggests the specific combination of optimal transport coupling with Shapley-driven feature selection for counterfactual refinement represents a relatively sparse research direction within the broader optimization-based counterfactual landscape.

The taxonomy reveals that COLA's parent branch, 'Optimization-Based Counterfactual Generation Methods', contains several neighboring leaves: 'Gradient-Based and Perturbation Optimization' (3 papers), 'Shapley Value-Guided Optimization' (2 papers), and 'Multi-Objective and Constraint-Based Optimization' (5 papers). While Shapley values appear in adjacent work for feature prioritization, and optimal transport exists in theoretical frameworks, the taxonomy structure indicates that coupling-based refinement as a distinct methodological approach has not been extensively explored. The scope note clarifies this leaf excludes methods using optimal transport only for post-hoc refinement without coupling theory, suggesting a narrow definitional boundary.

Among 29 candidates examined across three contributions, the 'p-SHAP' component shows one refutable candidate from 10 examined, indicating some overlap with prior Shapley-based attribution methods. The 'COLA framework' contribution examined 10 candidates with zero refutations, suggesting the overall refinement architecture appears distinct within the limited search scope. The 'theoretical guarantees' contribution also shows no refutations across 9 candidates. These statistics reflect a top-K semantic search, not exhaustive coverage, meaning the apparent novelty of COLA's coupling-driven refinement may stem partly from the nascent state of this specific methodological intersection rather than comprehensive field saturation.

Given the limited search scope of 29 candidates and the taxonomy's structural sparsity in this leaf, COLA appears to occupy a relatively unexplored niche combining optimal transport coupling with Shapley attribution for counterfactual refinement. The single refutation for p-SHAP suggests incremental overlap in attribution mechanics, while the framework's overall architecture shows distinctiveness within the examined literature. However, the analysis cannot rule out relevant work outside the top-K semantic matches or in adjacent optimization paradigms not captured by the current taxonomy boundaries.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
29
Contribution Candidate Papers Compared
1
Refutable Paper

Research Landscape Overview

Core task: Refining counterfactual explanations with minimal feature modifications. The field of counterfactual explanations has evolved into a rich landscape organized around distinct methodological and application-oriented branches. Optimization-Based Counterfactual Generation Methods form a central pillar, encompassing gradient-driven approaches, constraint satisfaction techniques like Minimal Satisfiable Perturbations[2], and specialized refinements using optimal transport or coupling strategies. Generative Model-Based Counterfactual Explanations leverage VAEs, GANs, and diffusion models such as Diffusion Visual Counterfactual[9] to produce realistic alternatives, while Evolutionary and Search-Based Counterfactual Methods employ genetic algorithms and heuristic search. Case-Based and Retrieval-Augmented Counterfactuals retrieve similar instances from data, and Domain-Specific Counterfactual Explanation Applications address tailored challenges in healthcare, finance, autonomous driving, and other areas. Actionability, Feasibility, and User-Centric Counterfactuals emphasize practical constraints and human interpretability, exemplified by works like Actionable Minimality[6]. Additional branches include Counterfactual Data Augmentation, Unified Frameworks, Concept-Based explanations, and emerging Large Language Model-Based generation approaches. Within the optimization landscape, a key tension exists between achieving minimal perturbations and ensuring actionability or semantic coherence. Many studies focus on sparsity and proximity metrics, while others incorporate causal constraints or domain knowledge to enhance feasibility. Joint Distribution Shapley[0] situates itself within the Optimal Transport and Coupling-Based Refinement cluster, emphasizing distributional alignment to refine counterfactuals beyond simple distance minimization. This contrasts with gradient-based methods like CF-GNNExplainer[3] for graph data or boundary-focused approaches such as Minimal Feature Boundary[4], which prioritize decision surface proximity. The interplay between theoretical rigor—ensuring minimal yet meaningful changes—and practical deployment remains an active research question, with Joint Distribution Shapley[0] contributing a principled coupling perspective that complements existing optimization paradigms by addressing joint feature dependencies.

Claimed Contributions

COLA framework for sparse counterfactual explanations

The authors propose COLA (COunterfactuals with Limited Actions), a general post-hoc framework that refines counterfactual explanations across different models and CE generators. It uses optimal transport to compute a coupling between factual and counterfactual sets, which then guides Shapley-based attribution to select minimal feature edits while preserving target effects.

10 retrieved papers
Joint distribution-informed Shapley values (p-SHAP)

The authors introduce p-SHAP, a Shapley value method that integrates an algorithm returning joint probability between factual and counterfactual instances. This method unifies other commonly used Shapley methods under appropriate couplings and provides a modular interface for attribution and edit selection.

10 retrieved papers
Can Refute
Theoretical guarantees for OT-based counterfactual refinement

The authors provide theoretical results showing that optimal transport minimizes an upper bound on the 1-Wasserstein divergence between factual and counterfactual outcomes. They also prove that under mild conditions, refined counterfactuals remain no farther from factuals than the original counterfactuals.

9 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

COLA framework for sparse counterfactual explanations

The authors propose COLA (COunterfactuals with Limited Actions), a general post-hoc framework that refines counterfactual explanations across different models and CE generators. It uses optimal transport to compute a coupling between factual and counterfactual sets, which then guides Shapley-based attribution to select minimal feature edits while preserving target effects.

Contribution

Joint distribution-informed Shapley values (p-SHAP)

The authors introduce p-SHAP, a Shapley value method that integrates an algorithm returning joint probability between factual and counterfactual instances. This method unifies other commonly used Shapley methods under appropriate couplings and provides a modular interface for attribution and edit selection.

Contribution

Theoretical guarantees for OT-based counterfactual refinement

The authors provide theoretical results showing that optimal transport minimizes an upper bound on the 1-Wasserstein divergence between factual and counterfactual outcomes. They also prove that under mild conditions, refined counterfactuals remain no farther from factuals than the original counterfactuals.