Self-Consistency Improves the Trustworthiness of Self-Interpretable GNNs

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

Self-interpretble GNNs; Trustworthy; Consistency; Faithfulness

Graph Neural Networks (GNNs) achieve strong predictive performance but offer limited transparency in their decision-making. Self-Interpretable GNNs (SI-GNNs) address this by generating built-in explanations, yet their training objectives are misaligned with evaluation criteria such as faithfulness. This raises two key questions: (i) can faithfulness be explicitly optimized during training, and (ii) does such optimization genuinely improve explanation quality? We show that faithfulness is intrinsically tied to explanation self-consistency and can therefore be optimized directly. Empirical analysis further reveals that self-inconsistency predominantly occurs on unimportant features, linking it to redundancy-driven explanation inconsistency observed in recent work and suggesting untapped potential for improving explanation quality. Building on these insights, we introduce a simple, model-agnostic self-consistency (SC) training strategy. Without changing architectures or pipelines, SC consistently improves explanation quality across multiple dimensions and benchmarks, offering an effective and scalable pathway to more trustworthy GNN explanations.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes a self-consistency training strategy to improve explanation quality in self-interpretable GNNs by directly optimizing faithfulness during training. It resides in the 'Faithfulness-Driven Training Strategies' leaf, which contains only three papers total including this work. This represents a relatively sparse research direction within the broader taxonomy of 50 papers across 36 topics, suggesting the specific focus on training-time faithfulness optimization remains underexplored compared to architectural innovations or post-hoc evaluation methods that dominate other branches.

The taxonomy reveals neighboring work in 'Causal and Information-Theoretic Training Objectives' and 'Pre-Training and Transfer Learning for Interpretability', indicating alternative approaches to improving explanation quality through different training paradigms. The sibling papers in the same leaf address faithfulness through sufficient-necessary explanation decomposition and semantic-level supervision, representing distinct technical strategies within the shared goal of aligning training objectives with explanation criteria. The broader 'Training Objectives and Optimization' branch remains less populated than 'Self-Interpretable GNN Architectures' or 'Post-Hoc Explanation Methods', highlighting a gap between architectural design and training methodology research.

Among 24 candidates examined across three contributions, none were identified as clearly refuting the proposed work. The first contribution linking faithfulness to self-consistency examined 10 candidates with no refutations, the self-consistency training strategy examined 4 candidates with no refutations, and the empirical analysis of inconsistency patterns examined 10 candidates with no refutations. This suggests that within the limited search scope, the specific framing of faithfulness as self-consistency and the proposed training approach appear relatively novel, though the analysis does not claim exhaustive coverage of all potentially relevant prior work.

Based on the limited literature search of 24 semantically similar papers, the work appears to occupy a distinct position by connecting faithfulness optimization to self-consistency principles during training. The sparse population of the faithfulness-driven training leaf and absence of identified overlapping work within the examined candidates suggest potential novelty, though the restricted search scope means additional relevant work may exist beyond the top-K semantic matches and citation expansion performed.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Improving explanation quality in self-interpretable graph neural networks. The field has evolved into several distinct branches that reflect different strategies for making GNN predictions more transparent. Self-Interpretable GNN Architectures and Frameworks[1] develop models that produce explanations by design, often using attention mechanisms, prototype learning[2][13][19], or concept-based representations[10][26]. Training Objectives and Optimization for Self-Interpretable GNNs focus on loss functions and learning strategies that encourage faithful, human-understandable explanations during training[4][22][23]. Evaluation and Analysis branches[40][43][44] address the challenge of measuring explanation quality, while Post-Hoc Explanation Methods[3][17][47] generate explanations after training using techniques like perturbation or gradient-based attribution. Task-Specific and Domain-Specific approaches[6][16][20][31][42][46] tailor interpretability to particular applications, and General Surveys[1][15] provide methodological overviews. Robustness and Trustworthiness branches examine the reliability of explanations under adversarial conditions or distribution shifts. A particularly active line of work centers on faithfulness-driven training strategies, which aim to ensure that explanations accurately reflect the model's true reasoning process rather than merely appearing plausible. Self-Consistency Trustworthiness[0] sits squarely within this branch, emphasizing training objectives that enforce internal consistency between predictions and their explanations. This contrasts with nearby works like Sufficient Necessary Explanations[22], which decompose explanations into components that are both required and adequate for predictions, and SES[23], which focuses on semantic-level explanation supervision. While many self-interpretable methods prioritize architectural innovations or post-hoc validation, the faithfulness-driven cluster addresses a fundamental tension: balancing predictive accuracy with the verifiability of explanations. Open questions persist around defining and measuring faithfulness rigorously[40][44], handling trade-offs between explanation simplicity and completeness, and extending these training strategies to diverse graph domains where ground-truth explanations remain scarce.

Claimed Contributions

Linking faithfulness to explanation self-consistency

10 retrieved papers

The authors establish a theoretical connection showing that faithfulness in GNN explanations is fundamentally related to self-consistency. This insight enables faithfulness to be directly optimized during training rather than only evaluated post-hoc.

10 retrieved papers

Self-consistency training strategy for SI-GNNs

4 retrieved papers

The authors propose a two-step training framework that adds a self-consistency loss to enforce agreement between successive explanations. This strategy is model-agnostic, requires no architectural changes, and can be applied to existing SI-GNNs.

4 retrieved papers

Empirical analysis of self-inconsistency patterns

10 retrieved papers

The authors conduct empirical studies demonstrating that self-inconsistency in SI-GNN explanations primarily arises from unimportant features. This finding connects self-inconsistency to the redundancy problem identified in prior work and motivates their training approach.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[22] Self-interpretable graph learning with sufficient and necessary explanations PDF

Deng Jiale, Shen Yan-Yan (2024)

[23] SES: Bridging the gap between explainability and prediction of graph neural networks PDF

Huang Zhenhua, LI Kunhao, Wang Shao-jie, Jia Zhaohong, Zhu Wen-tao, Mehrotra, Sharad (2024)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Linking faithfulness to explanation self-consistency

[6] Towards self-interpretable graph-level anomaly detection PDF

Cannot Refute

[43] Semantic Interpretation and Validation of Graph Attention-Based Explanations for GNN Models PDF

Cannot Refute

[55] Towards Faithful and Consistent Explanations for Graph Neural Networks PDF

Cannot Refute

[61] On Consistency in Graph Neural Network Interpretation PDF

Cannot Refute

[64] AttenhERG: a reliable and interpretable graph neural network framework for predicting hERG channel blockers PDF

Cannot Refute

[65] The intelligible and effective graph neural additive network PDF

Cannot Refute

[66] Reliable interpretability of biology-inspired deep neural networks PDF

Cannot Refute

[67] GraphXAIN: Narratives to Explain Graph Neural Networks PDF

Cannot Refute

[68] Cooperative Explanations of Graph Neural Networks PDF

Cannot Refute

[69] Chemistry-intuitive explanation of graph neural networks for molecular property prediction with substructure masking PDF

Cannot Refute

Contribution

Self-consistency training strategy for SI-GNNs

[55] Towards Faithful and Consistent Explanations for Graph Neural Networks PDF

Cannot Refute

[61] On Consistency in Graph Neural Network Interpretation PDF

Cannot Refute

[62] Multi-task multi-station earthquake monitoring: An all-in-one seismic Phase picking, Location, and Association Network (PLAN) PDF

Cannot Refute

[63] Interpretable graph neural network framework for ultra-low-power junctionless GAA FET current mirrors: bridging physics-based modeling and circuit design PDF

Cannot Refute

Contribution

Empirical analysis of self-inconsistency patterns

[51] On the consistency of GNN explainability methods PDF

Cannot Refute

[52] Evaluating explainability for graph neural networks PDF

Cannot Refute

[53] Graph Segmentation and Contrastive Enhanced Explainer for Graph Neural Networks PDF

Cannot Refute

[54] On gnn explainability with activation rules PDF

Cannot Refute

[55] Towards Faithful and Consistent Explanations for Graph Neural Networks PDF

Cannot Refute

[56] On regularization for explaining graph neural networks: An information theory perspective PDF

Cannot Refute

[57] Explainable graph neural networks under fire PDF

Cannot Refute

[58] SAME: Uncovering GNN black box with structure-aware shapley-based multipiece explanations PDF

Cannot Refute

[59] D4explainer: In-distribution explanations of graph neural network via discrete denoising diffusion PDF

Cannot Refute

[60] Towards multi-grained explainability for graph neural networks PDF

Cannot Refute

Self-Consistency Improves the Trustworthiness of Self-Interpretable GNNs

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[22] Self-interpretable graph learning with sufficient and necessary explanations PDF

[23] SES: Bridging the gap between explainability and prediction of graph neural networks PDF

Contribution Analysis

Linking faithfulness to explanation self-consistency

[6] Towards self-interpretable graph-level anomaly detection PDF

[43] Semantic Interpretation and Validation of Graph Attention-Based Explanations for GNN Models PDF

[55] Towards Faithful and Consistent Explanations for Graph Neural Networks PDF

[61] On Consistency in Graph Neural Network Interpretation PDF

[64] AttenhERG: a reliable and interpretable graph neural network framework for predicting hERG channel blockers PDF

[65] The intelligible and effective graph neural additive network PDF

[66] Reliable interpretability of biology-inspired deep neural networks PDF

[67] GraphXAIN: Narratives to Explain Graph Neural Networks PDF

[68] Cooperative Explanations of Graph Neural Networks PDF

[69] Chemistry-intuitive explanation of graph neural networks for molecular property prediction with substructure masking PDF

Self-consistency training strategy for SI-GNNs

[55] Towards Faithful and Consistent Explanations for Graph Neural Networks PDF

[61] On Consistency in Graph Neural Network Interpretation PDF

[62] Multi-task multi-station earthquake monitoring: An all-in-one seismic Phase picking, Location, and Association Network (PLAN) PDF

[63] Interpretable graph neural network framework for ultra-low-power junctionless GAA FET current mirrors: bridging physics-based modeling and circuit design PDF

Empirical analysis of self-inconsistency patterns

[51] On the consistency of GNN explainability methods PDF

[52] Evaluating explainability for graph neural networks PDF

[53] Graph Segmentation and Contrastive Enhanced Explainer for Graph Neural Networks PDF

[54] On gnn explainability with activation rules PDF

[55] Towards Faithful and Consistent Explanations for Graph Neural Networks PDF

[56] On regularization for explaining graph neural networks: An information theory perspective PDF

[57] Explainable graph neural networks under fire PDF

[58] SAME: Uncovering GNN black box with structure-aware shapley-based multipiece explanations PDF

[59] D4explainer: In-distribution explanations of graph neural network via discrete denoising diffusion PDF

[60] Towards multi-grained explainability for graph neural networks PDF

Table of Contents