Hierarchical Concept-based Interpretable Models

ICLR 2026 Conference SubmissionAnonymous Authors
Explainable Artificial IntelligenceConcept-based ExplainabilityConcept DiscoveryConcept HierarchyConcept Bottleneck ModelsConcept Embedding ModelsClusteringSparse Autoencoders
Abstract:

Modern deep neural networks remain challenging to interpret due to the opacity of their latent representations, impeding model understanding, debugging, and debiasing. Concept Embedding Models (CEMs) address this by mapping inputs to human-interpretable concept representations from which tasks can be predicted. Yet, CEMs fail to represent inter-concept relationships and require concept annotations at different granularities during training, limiting their applicability. In this paper, we introduce Hierarchical Concept Embedding Models (HiCEMs), a new family of CEMs that explicitly model concept relationships through hierarchical structures. To enable HiCEMs in real-world settings, we propose Concept Splitting, a method for automatically discovering finer-grained sub-concepts from a pretrained CEM’s embedding space without requiring additional annotations. This allows HiCEMs to generate fine-grained explanations from limited concept labels, reducing annotation burdens. Our evaluation across multiple datasets, including a user study and experiments on PseudoKitchens, a newly proposed concept-based dataset of 3D kitchen renders, demonstrates that (1) Concept Splitting discovers human-interpretable sub-concepts absent during training that can be used to train highly accurate HiCEMs, and (2) HiCEMs enable powerful test-time concept interventions at different granularities, leading to improved task accuracy.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces Hierarchical Concept Embedding Models (HiCEMs) and a Concept Splitting method to automatically discover fine-grained sub-concepts from pretrained concept embeddings. It resides in the 'Concept Bottleneck Models and Extensions' leaf, which contains six papers including the original work. This leaf sits within the broader 'Concept-Based Interpretability Architectures' branch, indicating a moderately populated research direction focused on architectures that explicitly incorporate human-interpretable concepts as intermediate representations during model design.

The taxonomy reveals neighboring leaves addressing related but distinct approaches: 'Part-Whole Hierarchical Architectures' explores parsing inputs into dynamic part-whole structures, while 'Semantic Tree and Taxonomy-Driven Architectures' embeds predefined hierarchical taxonomies into network structure. The sibling papers in the same leaf include works on hierarchical concept bottlenecks and tabular concept bottleneck models, suggesting active exploration of structured concept representations. The paper's focus on learning hierarchical relationships from limited annotations distinguishes it from methods requiring extensive predefined taxonomies or post-hoc concept extraction.

Among thirty candidates examined, the Concept Splitting method (ten candidates, zero refutations) and HiCEMs architecture (ten candidates, zero refutations) appear relatively novel within this limited search scope. The PseudoKitchens dataset contribution shows one refutable candidate among ten examined, indicating potential overlap with existing concept-based datasets. The statistics suggest that the core methodological contributions—automatic sub-concept discovery and hierarchical concept modeling—face less direct prior work among the examined candidates, though the dataset component encounters more substantial precedent.

Based on the top-thirty semantic matches and citation expansion, the work appears to occupy a distinct position within concept bottleneck research by combining automatic hierarchy discovery with reduced annotation requirements. However, the limited search scope means potentially relevant work in adjacent areas—such as hierarchical concept discovery methods or prototype-based concept learning—may not have been fully examined. The analysis captures the paper's positioning within its immediate research neighborhood but cannot claim exhaustive coverage of all related prior art.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
30
Contribution Candidate Papers Compared
1
Refutable Paper

Research Landscape Overview

Core task: Modeling hierarchical concept relationships in interpretable neural networks. The field has evolved into several distinct branches that address interpretability from complementary angles. Concept-Based Interpretability Architectures focus on embedding human-understandable concepts directly into model design, often through bottleneck layers or structured representations that enforce hierarchical reasoning. Concept Discovery and Extraction Methods aim to uncover latent concepts from trained networks, revealing what models have learned without explicit supervision. Post-Hoc Explanation Methods provide interpretability after training by analyzing activations or generating explanations for existing black-box models. Domain-Specific Interpretable Applications tailor these techniques to fields like medicine, agriculture, and engineering, while Interpretability Frameworks and Methodological Foundations establish theoretical grounding and evaluation standards. Works like Part-Whole Hierarchies[1] and Concept Pyramid Scheme[7] illustrate how architectures can encode multi-level concept structures, whereas approaches such as Attention-Guided Graph[3] demonstrate graph-based reasoning over concept relationships. Within Concept-Based Interpretability Architectures, a particularly active line explores concept bottleneck models and their extensions, balancing prediction accuracy with human-interpretable intermediate representations. Hierarchical Concept Models[0] sits squarely in this space, emphasizing explicit modeling of hierarchical relationships among concepts rather than treating them as flat, independent features. This contrasts with simpler bottleneck approaches and aligns closely with works like Hierarchical Concept Bottleneck[23], which similarly structures concepts in multi-level taxonomies, and TabCBM[19], which adapts bottleneck ideas to tabular domains. The central tension across these methods involves trade-offs between expressiveness, intervention capability, and computational overhead: richer hierarchies can capture nuanced domain knowledge but may complicate training or require more extensive concept annotations. Hierarchical Concept Models[0] addresses this by proposing mechanisms to learn and leverage concept dependencies, positioning itself as a bridge between purely data-driven discovery and heavily supervised structured models.

Claimed Contributions

Concept Splitting method for discovering sub-concepts

The authors propose Concept Splitting, a method that uses sparse autoencoders to automatically discover finer-grained sub-concepts from a pretrained Concept Embedding Model's embedding space without requiring additional annotations. This enables models to generate fine-grained explanations from limited concept labels.

10 retrieved papers
Hierarchical Concept Embedding Models (HiCEMs)

The authors introduce HiCEMs, a new family of concept-based models that explicitly model hierarchical relationships between concepts through structured architectures. HiCEMs enable test-time concept interventions at different granularity levels while maintaining interpretability.

10 retrieved papers
PseudoKitchens dataset

The authors create PseudoKitchens, a new concept-based dataset consisting of synthetic photorealistic 3D kitchen renders that provides perfect ground-truth concept annotations. This dataset enables rigorous evaluation of concept-based models with complete control over scene generation.

10 retrieved papers
Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Concept Splitting method for discovering sub-concepts

The authors propose Concept Splitting, a method that uses sparse autoencoders to automatically discover finer-grained sub-concepts from a pretrained Concept Embedding Model's embedding space without requiring additional annotations. This enables models to generate fine-grained explanations from limited concept labels.

Contribution

Hierarchical Concept Embedding Models (HiCEMs)

The authors introduce HiCEMs, a new family of concept-based models that explicitly model hierarchical relationships between concepts through structured architectures. HiCEMs enable test-time concept interventions at different granularity levels while maintaining interpretability.

Contribution

PseudoKitchens dataset

The authors create PseudoKitchens, a new concept-based dataset consisting of synthetic photorealistic 3D kitchen renders that provides perfect ground-truth concept annotations. This dataset enables rigorous evaluation of concept-based models with complete control over scene generation.

Hierarchical Concept-based Interpretable Models | Novelty Validation