Self-Consistency Improves the Trustworthiness of Self-Interpretable GNNs

ICLR 2026 Conference SubmissionAnonymous Authors
Self-interpretble GNNs; Trustworthy; Consistency; Faithfulness
Abstract:

Graph Neural Networks (GNNs) achieve strong predictive performance but offer limited transparency in their decision-making. Self-Interpretable GNNs (SI-GNNs) address this by generating built-in explanations, yet their training objectives are misaligned with evaluation criteria such as faithfulness. This raises two key questions: (i) can faithfulness be explicitly optimized during training, and (ii) does such optimization genuinely improve explanation quality? We show that faithfulness is intrinsically tied to explanation self-consistency and can therefore be optimized directly. Empirical analysis further reveals that self-inconsistency predominantly occurs on unimportant features, linking it to redundancy-driven explanation inconsistency observed in recent work and suggesting untapped potential for improving explanation quality. Building on these insights, we introduce a simple, model-agnostic self-consistency (SC) training strategy. Without changing architectures or pipelines, SC consistently improves explanation quality across multiple dimensions and benchmarks, offering an effective and scalable pathway to more trustworthy GNN explanations.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes a self-consistency training strategy to improve explanation quality in self-interpretable GNNs by directly optimizing faithfulness during training. It resides in the 'Faithfulness-Driven Training Strategies' leaf, which contains only three papers total including this work. This represents a relatively sparse research direction within the broader taxonomy of 50 papers across 36 topics, suggesting the specific focus on training-time faithfulness optimization remains underexplored compared to architectural innovations or post-hoc evaluation methods that dominate other branches.

The taxonomy reveals neighboring work in 'Causal and Information-Theoretic Training Objectives' and 'Pre-Training and Transfer Learning for Interpretability', indicating alternative approaches to improving explanation quality through different training paradigms. The sibling papers in the same leaf address faithfulness through sufficient-necessary explanation decomposition and semantic-level supervision, representing distinct technical strategies within the shared goal of aligning training objectives with explanation criteria. The broader 'Training Objectives and Optimization' branch remains less populated than 'Self-Interpretable GNN Architectures' or 'Post-Hoc Explanation Methods', highlighting a gap between architectural design and training methodology research.

Among 24 candidates examined across three contributions, none were identified as clearly refuting the proposed work. The first contribution linking faithfulness to self-consistency examined 10 candidates with no refutations, the self-consistency training strategy examined 4 candidates with no refutations, and the empirical analysis of inconsistency patterns examined 10 candidates with no refutations. This suggests that within the limited search scope, the specific framing of faithfulness as self-consistency and the proposed training approach appear relatively novel, though the analysis does not claim exhaustive coverage of all potentially relevant prior work.

Based on the limited literature search of 24 semantically similar papers, the work appears to occupy a distinct position by connecting faithfulness optimization to self-consistency principles during training. The sparse population of the faithfulness-driven training leaf and absence of identified overlapping work within the examined candidates suggest potential novelty, though the restricted search scope means additional relevant work may exist beyond the top-K semantic matches and citation expansion performed.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
24
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: Improving explanation quality in self-interpretable graph neural networks. The field has evolved into several distinct branches that reflect different strategies for making GNN predictions more transparent. Self-Interpretable GNN Architectures and Frameworks[1] develop models that produce explanations by design, often using attention mechanisms, prototype learning[2][13][19], or concept-based representations[10][26]. Training Objectives and Optimization for Self-Interpretable GNNs focus on loss functions and learning strategies that encourage faithful, human-understandable explanations during training[4][22][23]. Evaluation and Analysis branches[40][43][44] address the challenge of measuring explanation quality, while Post-Hoc Explanation Methods[3][17][47] generate explanations after training using techniques like perturbation or gradient-based attribution. Task-Specific and Domain-Specific approaches[6][16][20][31][42][46] tailor interpretability to particular applications, and General Surveys[1][15] provide methodological overviews. Robustness and Trustworthiness branches examine the reliability of explanations under adversarial conditions or distribution shifts. A particularly active line of work centers on faithfulness-driven training strategies, which aim to ensure that explanations accurately reflect the model's true reasoning process rather than merely appearing plausible. Self-Consistency Trustworthiness[0] sits squarely within this branch, emphasizing training objectives that enforce internal consistency between predictions and their explanations. This contrasts with nearby works like Sufficient Necessary Explanations[22], which decompose explanations into components that are both required and adequate for predictions, and SES[23], which focuses on semantic-level explanation supervision. While many self-interpretable methods prioritize architectural innovations or post-hoc validation, the faithfulness-driven cluster addresses a fundamental tension: balancing predictive accuracy with the verifiability of explanations. Open questions persist around defining and measuring faithfulness rigorously[40][44], handling trade-offs between explanation simplicity and completeness, and extending these training strategies to diverse graph domains where ground-truth explanations remain scarce.

Claimed Contributions

Linking faithfulness to explanation self-consistency

The authors establish a theoretical connection showing that faithfulness in GNN explanations is fundamentally related to self-consistency. This insight enables faithfulness to be directly optimized during training rather than only evaluated post-hoc.

10 retrieved papers
Self-consistency training strategy for SI-GNNs

The authors propose a two-step training framework that adds a self-consistency loss to enforce agreement between successive explanations. This strategy is model-agnostic, requires no architectural changes, and can be applied to existing SI-GNNs.

4 retrieved papers
Empirical analysis of self-inconsistency patterns

The authors conduct empirical studies demonstrating that self-inconsistency in SI-GNN explanations primarily arises from unimportant features. This finding connects self-inconsistency to the redundancy problem identified in prior work and motivates their training approach.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Linking faithfulness to explanation self-consistency

The authors establish a theoretical connection showing that faithfulness in GNN explanations is fundamentally related to self-consistency. This insight enables faithfulness to be directly optimized during training rather than only evaluated post-hoc.

Contribution

Self-consistency training strategy for SI-GNNs

The authors propose a two-step training framework that adds a self-consistency loss to enforce agreement between successive explanations. This strategy is model-agnostic, requires no architectural changes, and can be applied to existing SI-GNNs.

Contribution

Empirical analysis of self-inconsistency patterns

The authors conduct empirical studies demonstrating that self-inconsistency in SI-GNN explanations primarily arises from unimportant features. This finding connects self-inconsistency to the redundancy problem identified in prior work and motivates their training approach.