Provably Explaining Neural Additive Models

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

explainabilityXAIexplainable AIformal verificationsufficient explanations

Despite significant progress in post-hoc explanation methods for neural networks, many remain heuristic and lack provable guarantees. A key approach for obtaining explanations with provable guarantees is by identifying a cardinally-minimal subset of input features which by itself is provably sufficient to determine the prediction. However, for standard neural networks, this task is often computationally infeasible, as it demands a worst-case exponential number of verification queries in the number of input features, each of which is NP-hard. In this work, we show that for Neural Additive Models (NAMs), a recent and more interpretable neural network family, we can efficiently generate explanations with such guarantees. We present a new model-specific algorithm for NAMs that generates provably cardinally-minimal explanations using only a logarithmic number of verification queries in the number of input features, after a parallelized preprocessing step with logarithmic runtime in the required precision is applied to each small univariate NAM component. Our algorithm not only makes the task of obtaining cardinally-minimal explanations feasible, but even outperforms existing algorithms designed to find subset-minimal explanations -- which may be larger and less informative but easier to compute -- despite our algorithm solving a much more difficult task. Our experiments demonstrate that, compared to previous algorithms, our approach provides provably smaller explanations than existing works and substantially reduces the computation time. Moreover, we show that our generated provable explanations offer benefits that are unattainable by standard sampling-based techniques typically used to interpret NAMs.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper develops a model-specific algorithm for generating cardinally-minimal sufficient explanations in Neural Additive Models (NAMs), reducing verification complexity from exponential to logarithmic in the number of input features. Within the taxonomy, it occupies the sole position in the 'Cardinally-Minimal Sufficient Explanations with Verification' leaf under 'Provable Explanation Generation for Neural Additive Models'. This leaf contains only the original paper itself, indicating a sparse research direction with no sibling papers identified in the taxonomy structure.

The taxonomy reveals two main branches: provable explanation generation (where this work resides) and interpretable model applications using additive structures. Neighboring work includes Neural Additive Models for Clustering and Additive Models for Multi-Criteria Decision Aiding, both focused on practical applications rather than formal guarantees. The taxonomy narrative mentions related efforts like NeurCAM and Necessary Sufficient Explanations, which explore verification procedures and different notions of minimality, suggesting the paper connects to a broader interest in certified explanations but diverges by targeting cardinality optimality specifically for NAMs.

Among 11 candidates examined across three contributions, no refutable prior work was identified. The 'First provably sufficient explanations for NAMs' contribution examined 10 candidates with none providing overlapping prior work, while the 'Parallel interval importance sorting procedure' examined 1 candidate without refutation. The 'Model-specific algorithm' contribution examined no candidates. Given the limited search scope of 11 papers total, these statistics suggest the specific combination of cardinality minimality, provable sufficiency, and NAM-specific algorithms may be relatively unexplored, though the analysis does not cover exhaustive literature.

Based on the limited search scope and sparse taxonomy position, the work appears to address a gap in providing formal guarantees for NAM explanations. However, the analysis covers only top-K semantic matches and does not exhaustively survey all explanation methods for additive models or verification techniques in interpretable ML. The absence of sibling papers and limited candidate examination suggest either genuine novelty in this specific problem formulation or incomplete coverage of related verification-based explanation work.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Generating provably cardinally-minimal sufficient explanations for neural additive models. The field centers on making neural additive models—architectures that decompose predictions into interpretable per-feature contributions—more transparent through rigorous explanation methods. The taxonomy divides into two main branches: one focused on provable explanation generation, which emphasizes formal guarantees and verification that explanations are both minimal and sufficient, and another on interpretable model applications that leverage additive structures for practical deployment. The provable branch tends to concentrate on algorithmic techniques that certify explanation quality, ensuring that no smaller subset of features can justify a prediction, while the applications branch explores how additive decompositions can be used in domains requiring inherent interpretability. Within the provable explanation generation branch, recent work has explored different notions of minimality and sufficiency. Some studies develop verification procedures to confirm that a chosen feature subset is indeed cardinally minimal, while others investigate necessary and sufficient conditions for explanations, as seen in Necessary Sufficient Explanations[2]. NeurCAM[1] exemplifies efforts to integrate class activation mapping ideas with neural additive architectures, bridging visualization and formal explanation. The original paper, Explaining Neural Additive[0], sits squarely in the provable branch, emphasizing cardinally-minimal sufficient explanations with verification. Compared to neighboring works, it appears to prioritize formal guarantees of minimality over heuristic or approximate methods, aiming to certify that no redundant features remain in the explanation while maintaining prediction sufficiency.

Claimed Contributions

Model-specific algorithm for cardinally-minimal explanations in NAMs

0 retrieved papers

The authors introduce an algorithm tailored to Neural Additive Models that efficiently computes cardinally-minimal sufficient explanations. Unlike general neural networks requiring exponential queries, this method exploits NAMs' additive structure to achieve logarithmic query complexity through parallelized preprocessing and binary search.

0 retrieved papers

Parallel interval importance sorting procedure

1 retrieved paper

The authors develop a preprocessing stage that operates in parallel on each univariate NAM component to compute importance intervals and establish a total ordering of features. This parallelized approach substantially reduces computational overhead by working on small univariate functions rather than the full model.

1 retrieved paper

First provably sufficient explanations for NAMs

10 retrieved papers

The authors present the first approach for generating explanations with provable sufficiency guarantees specifically for Neural Additive Models. This advances the trustworthiness of NAMs in safety-critical applications where formal guarantees are essential.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Model-specific algorithm for cardinally-minimal explanations in NAMs

Contribution

Parallel interval importance sorting procedure

[12] Feature selection for classification of SELDI-TOF-MS proteomic profiles PDF

Cannot Refute

Contribution

First provably sufficient explanations for NAMs

[1] NeurCAM: Interpretable Neural Clustering via Additive Models PDF

Cannot Refute

[3] Step-wise explanations for the additive model PDF

Cannot Refute

[4] Predictive model and risk analysis for peripheral vascular disease in type 2 diabetes mellitus patients using machine learning and shapley additive explanation PDF

Cannot Refute

[5] Weakly-Supervised Abstraction for Linear Additive Models PDF

Cannot Refute

[6] On the role of context in reading time prediction PDF

Cannot Refute

[7] FAIXID: A framework for enhancing AI explainability of intrusion detection results using data cleaning techniques PDF

Cannot Refute

[8] Contextual Importance and Utility: aTheoretical Foundation PDF

Cannot Refute

[9] Unifying Attribution-Based Explanations Using Functional Decomposition PDF

Cannot Refute

[10] A Layered Approach to Safety Certification for AI-Driven Systems Using Explainable and Verifiable Machine Learning Models PDF

Cannot Refute

[11] A Meta-Learning Control Algorithm with Provable Finite-Time Guarantees PDF

Cannot Refute

Provably Explaining Neural Additive Models

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

Contribution Analysis

Model-specific algorithm for cardinally-minimal explanations in NAMs

Parallel interval importance sorting procedure

[12] Feature selection for classification of SELDI-TOF-MS proteomic profiles PDF

First provably sufficient explanations for NAMs

[1] NeurCAM: Interpretable Neural Clustering via Additive Models PDF

[3] Step-wise explanations for the additive model PDF

[4] Predictive model and risk analysis for peripheral vascular disease in type 2 diabetes mellitus patients using machine learning and shapley additive explanation PDF

[5] Weakly-Supervised Abstraction for Linear Additive Models PDF

[6] On the role of context in reading time prediction PDF

[7] FAIXID: A framework for enhancing AI explainability of intrusion detection results using data cleaning techniques PDF

[8] Contextual Importance and Utility: aTheoretical Foundation PDF

[9] Unifying Attribution-Based Explanations Using Functional Decomposition PDF

[10] A Layered Approach to Safety Certification for AI-Driven Systems Using Explainable and Verifiable Machine Learning Models PDF

[11] A Meta-Learning Control Algorithm with Provable Finite-Time Guarantees PDF

Table of Contents