Answering Counterfactual Queries on Graph Databases

ICLR 2026 Conference SubmissionAnonymous Authors
Counterfactual AnalysisGraph Database
Abstract:

Counterfactual analysis on graph data is central to causal reasoning and interpretability, yet existing graph-based methods rely on ad hoc perturbations and remain tied to model behavior rather than underlying data. To address this challenge, we introduce Counterfactual Graph Database (CF-GDB) queries, the first query-based framework for counterfactual reasoning on graphs that grounds counterfactuals in verifiable database instances. Our approach abstracts graphs into semantically meaningful concepts and compares them using a hypergraph-based distance that integrates local structure with global semantics. To ensure efficiency and scalability, we propose two complementary indices: the Concept Distribution Index (CDI), a histogram that provides certified lower bounds, and the Concept Semantic Index (CSI), a continuous embedding that provides upper bounds. These indices yield provably tight sandwich guarantees and enable efficient candidate pruning while preserving the fidelity of counterfactual retrieval. Using 8 read data sets across 4 domains, CF-GDB improves accuracy by over 20% and achieves up to 20× faster performance, demonstrating both fidelity and scalability.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces CF-GDB, a query-based framework for counterfactual reasoning on graph databases that grounds counterfactuals in verifiable database instances rather than model perturbations. It resides in the 'Counterfactual Query Systems and Database Integration' leaf, which contains only five papers total (including this one). This is a relatively sparse research direction within the broader taxonomy of 50 papers, suggesting that database-centric counterfactual query systems remain an emerging area compared to the more crowded GNN explainability branches.

The taxonomy reveals that most counterfactual graph research concentrates on GNN explanation methods (12 papers across instance-level and global explainers) and fairness applications (5 papers). The original paper's leaf sits alongside work on what-if databases and counterfactual visualization frameworks, but diverges from neighboring branches focused on causal discovery, LLM-based causal reasoning, and application-specific methods in recommendation or knowledge graphs. The scope_note emphasizes integration with query languages and database systems, distinguishing this work from theoretical causal frameworks and GNN-centric explanation techniques that dominate sibling categories.

Among 17 candidates examined across three contributions, no refutable prior work was identified. The CF-GDB framework examined 10 candidates with zero refutations, the C2GQ method examined 2 candidates with zero refutations, and the dual indexing scheme examined 5 candidates with zero refutations. This suggests that within the limited search scope—focused on top-K semantic matches and citation expansion—the specific combination of concept-based abstraction, hypergraph distance, and dual indexing for counterfactual graph queries appears distinct from examined prior work.

Based on the limited literature search of 17 candidates, the work appears to occupy a relatively novel position at the intersection of counterfactual reasoning and database query systems. However, the sparse population of the target leaf (5 papers) and the modest search scope mean this assessment reflects only the examined neighborhood, not an exhaustive survey of all potentially relevant database, graph query, or counterfactual reasoning literature.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
17
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: counterfactual reasoning on graph databases. The field encompasses a diverse set of approaches that apply counterfactual and causal thinking to graph-structured data. At the highest level, the taxonomy organizes work into several major branches: methods for explaining graph neural network predictions via counterfactual examples (e.g., Gcfexplainer[4], CF-GNNExplainer[8]), techniques addressing fairness and bias mitigation through counterfactual notions (e.g., Authentic Counterfactuals Fairness[10], Counterfactual Fairness Representation[18]), frameworks for counterfactual learning and data augmentation on graphs (e.g., Counterfactual Learning Graphs Survey[1], GraphCA[27]), general causal inference and reasoning systems (e.g., Causal Inference[3], Deep Causal Graphs[26]), integration of counterfactual queries with database systems (e.g., What If Databases[11], Counterfactual Graph Queries[0]), and application-specific methods spanning domains such as recommendation, anomaly detection, and knowledge graphs (e.g., Causal Knowledge Graph Recommendation[34], Counterfactual Anomaly Detection[14]). A particularly active line of work focuses on explainability for GNNs, where researchers seek minimal graph edits that flip model predictions, balancing interpretability with robustness (Robust Counterfactual GNN[5], Global Counterfactual GNN[6]). In contrast, the database integration branch explores how counterfactual queries can be formulated and executed over structured graph repositories, enabling users to ask "what if" questions directly within query languages. Counterfactual Graph Queries[0] sits squarely in this latter branch, alongside What If Databases[11] and Counterfactual Visualization Framework[13], emphasizing the operational and system-level challenges of embedding counterfactual reasoning into database workflows. Compared to neighbors like SIERRA[23] or Causal Hyper[20], which lean toward causal discovery or hypergraph reasoning, the original paper prioritizes query expressiveness and integration with existing database infrastructure, reflecting a systems-oriented perspective on counterfactual reasoning rather than purely algorithmic or fairness-driven concerns.

Claimed Contributions

Counterfactual Graph Database (CF-GDB) framework

The authors propose CF-GDB, a novel framework that reframes counterfactual reasoning as a query problem over graph databases. Unlike prior approaches that generate perturbed graphs to flip model predictions, CF-GDB retrieves dataset-grounded, domain-valid counterfactuals anchored in verifiable instances.

10 retrieved papers
Concept-Based Counterfactual Graph Query (C2GQ) method

The authors introduce C2GQ, which abstracts graphs into semantically meaningful concepts serving as prototypes that cluster structurally similar subgraphs. Differences are measured using a hypergraph-based concept distance grounded in unbalanced optimal transport, jointly capturing fine-grained local changes and global distributional shifts.

2 retrieved papers
Dual indexing scheme with certified bounds

The authors propose two complementary indices for scalable counterfactual queries: CDI provides certified lower bounds via histogram-based concept counts, while CSI provides upper bounds through continuous embeddings. These indices yield provably tight sandwich guarantees and enable efficient candidate pruning while preserving retrieval fidelity.

5 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Counterfactual Graph Database (CF-GDB) framework

The authors propose CF-GDB, a novel framework that reframes counterfactual reasoning as a query problem over graph databases. Unlike prior approaches that generate perturbed graphs to flip model predictions, CF-GDB retrieves dataset-grounded, domain-valid counterfactuals anchored in verifiable instances.

Contribution

Concept-Based Counterfactual Graph Query (C2GQ) method

The authors introduce C2GQ, which abstracts graphs into semantically meaningful concepts serving as prototypes that cluster structurally similar subgraphs. Differences are measured using a hypergraph-based concept distance grounded in unbalanced optimal transport, jointly capturing fine-grained local changes and global distributional shifts.

Contribution

Dual indexing scheme with certified bounds

The authors propose two complementary indices for scalable counterfactual queries: CDI provides certified lower bounds via histogram-based concept counts, while CSI provides upper bounds through continuous embeddings. These indices yield provably tight sandwich guarantees and enable efficient candidate pruning while preserving retrieval fidelity.