Hierarchical Concept-based Interpretable Models

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

Explainable Artificial IntelligenceConcept-based ExplainabilityConcept DiscoveryConcept HierarchyConcept Bottleneck ModelsConcept Embedding ModelsClusteringSparse Autoencoders

Modern deep neural networks remain challenging to interpret due to the opacity of their latent representations, impeding model understanding, debugging, and debiasing. Concept Embedding Models (CEMs) address this by mapping inputs to human-interpretable concept representations from which tasks can be predicted. Yet, CEMs fail to represent inter-concept relationships and require concept annotations at different granularities during training, limiting their applicability. In this paper, we introduce Hierarchical Concept Embedding Models (HiCEMs), a new family of CEMs that explicitly model concept relationships through hierarchical structures. To enable HiCEMs in real-world settings, we propose Concept Splitting, a method for automatically discovering finer-grained sub-concepts from a pretrained CEM’s embedding space without requiring additional annotations. This allows HiCEMs to generate fine-grained explanations from limited concept labels, reducing annotation burdens. Our evaluation across multiple datasets, including a user study and experiments on PseudoKitchens, a newly proposed concept-based dataset of 3D kitchen renders, demonstrates that (1) Concept Splitting discovers human-interpretable sub-concepts absent during training that can be used to train highly accurate HiCEMs, and (2) HiCEMs enable powerful test-time concept interventions at different granularities, leading to improved task accuracy.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces Hierarchical Concept Embedding Models (HiCEMs) and a Concept Splitting method to automatically discover fine-grained sub-concepts from pretrained concept embeddings. It resides in the 'Concept Bottleneck Models and Extensions' leaf, which contains six papers including the original work. This leaf sits within the broader 'Concept-Based Interpretability Architectures' branch, indicating a moderately populated research direction focused on architectures that explicitly incorporate human-interpretable concepts as intermediate representations during model design.

The taxonomy reveals neighboring leaves addressing related but distinct approaches: 'Part-Whole Hierarchical Architectures' explores parsing inputs into dynamic part-whole structures, while 'Semantic Tree and Taxonomy-Driven Architectures' embeds predefined hierarchical taxonomies into network structure. The sibling papers in the same leaf include works on hierarchical concept bottlenecks and tabular concept bottleneck models, suggesting active exploration of structured concept representations. The paper's focus on learning hierarchical relationships from limited annotations distinguishes it from methods requiring extensive predefined taxonomies or post-hoc concept extraction.

Among thirty candidates examined, the Concept Splitting method (ten candidates, zero refutations) and HiCEMs architecture (ten candidates, zero refutations) appear relatively novel within this limited search scope. The PseudoKitchens dataset contribution shows one refutable candidate among ten examined, indicating potential overlap with existing concept-based datasets. The statistics suggest that the core methodological contributions—automatic sub-concept discovery and hierarchical concept modeling—face less direct prior work among the examined candidates, though the dataset component encounters more substantial precedent.

Based on the top-thirty semantic matches and citation expansion, the work appears to occupy a distinct position within concept bottleneck research by combining automatic hierarchy discovery with reduced annotation requirements. However, the limited search scope means potentially relevant work in adjacent areas—such as hierarchical concept discovery methods or prototype-based concept learning—may not have been fully examined. The analysis captures the paper's positioning within its immediate research neighborhood but cannot claim exhaustive coverage of all related prior art.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Modeling hierarchical concept relationships in interpretable neural networks. The field has evolved into several distinct branches that address interpretability from complementary angles. Concept-Based Interpretability Architectures focus on embedding human-understandable concepts directly into model design, often through bottleneck layers or structured representations that enforce hierarchical reasoning. Concept Discovery and Extraction Methods aim to uncover latent concepts from trained networks, revealing what models have learned without explicit supervision. Post-Hoc Explanation Methods provide interpretability after training by analyzing activations or generating explanations for existing black-box models. Domain-Specific Interpretable Applications tailor these techniques to fields like medicine, agriculture, and engineering, while Interpretability Frameworks and Methodological Foundations establish theoretical grounding and evaluation standards. Works like Part-Whole Hierarchies[1] and Concept Pyramid Scheme[7] illustrate how architectures can encode multi-level concept structures, whereas approaches such as Attention-Guided Graph[3] demonstrate graph-based reasoning over concept relationships. Within Concept-Based Interpretability Architectures, a particularly active line explores concept bottleneck models and their extensions, balancing prediction accuracy with human-interpretable intermediate representations. Hierarchical Concept Models[0] sits squarely in this space, emphasizing explicit modeling of hierarchical relationships among concepts rather than treating them as flat, independent features. This contrasts with simpler bottleneck approaches and aligns closely with works like Hierarchical Concept Bottleneck[23], which similarly structures concepts in multi-level taxonomies, and TabCBM[19], which adapts bottleneck ideas to tabular domains. The central tension across these methods involves trade-offs between expressiveness, intervention capability, and computational overhead: richer hierarchies can capture nuanced domain knowledge but may complicate training or require more extensive concept annotations. Hierarchical Concept Models[0] addresses this by proposing mechanisms to learn and leverage concept dependencies, positioning itself as a bridge between purely data-driven discovery and heavily supervised structured models.

Claimed Contributions

Concept Splitting method for discovering sub-concepts

10 retrieved papers

The authors propose Concept Splitting, a method that uses sparse autoencoders to automatically discover finer-grained sub-concepts from a pretrained Concept Embedding Model's embedding space without requiring additional annotations. This enables models to generate fine-grained explanations from limited concept labels.

10 retrieved papers

Hierarchical Concept Embedding Models (HiCEMs)

10 retrieved papers

The authors introduce HiCEMs, a new family of concept-based models that explicitly model hierarchical relationships between concepts through structured architectures. HiCEMs enable test-time concept interventions at different granularity levels while maintaining interpretability.

10 retrieved papers

PseudoKitchens dataset

Can Refute

10 retrieved papers

The authors create PseudoKitchens, a new concept-based dataset consisting of synthetic photorealistic 3D kitchen renders that provides perfect ground-truth concept annotations. This dataset enables rigorous evaluation of concept-based models with complete control over scene generation.

10 retrieved papers

Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[3] Interpretable Hierarchical Concept Reasoning through Attention-Guided Graph Learning PDF

Barbiero, Pietro, David Debot, Dominici, Gabriele, Pietro Barbiero, Marra, Giuseppe, Gabriele Dominici, Giuseppe Marra (2025)

[7] Hierarchical concept discovery models: A concept pyramid scheme PDF

Konstantinos P. Panousis, Dino Ienco, Diego Marcos (2023)

[19] TabCBM: Concept-based Interpretable Neural Networks for Tabular Data PDF

Shams, Zohreh, Nelson, Michael Edward, Kim, Been, Jamnik, Mateja, Nelson Me, Kim B. (2024)

[23] Hierarchical concept bottleneck models for vision and their application to explainable fine classification and tracking PDF

Federico Pittino, Vesna Dimitrievska, R. Heer, Rudolf Heer (2023)

[42] Prototype based classification from hierarchy to fairness PDF

Tucker, Mycal, Shah, Julie, Mycal Tucker, J. Shah (2022)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Concept Splitting method for discovering sub-concepts

[60] CRISP: Persistent Concept Unlearning via Sparse Autoencoders PDF

Cannot Refute

[61] Disentangling dense embeddings with sparse autoencoders PDF

Cannot Refute

[62] Sparse Autoencoders Find Highly Interpretable Features in Language Models PDF

Cannot Refute

[63] Learning Multi-Level Features with Matryoshka Sparse Autoencoders PDF

Cannot Refute

[64] Interpreting CLIP with Hierarchical Sparse Autoencoders PDF

Cannot Refute

[65] Layer-wise evolution of representations in fine-tuned transformers: Insights from sparse autoencoders PDF

Cannot Refute

[66] Interpreting CLIP with Sparse Linear Concept Embeddings (SpLiCE) PDF

Cannot Refute

[67] Learning N: M Fine-grained Structured Sparse Neural Networks From Scratch PDF

Cannot Refute

[68] Learning biologically relevant features in a pathology foundation model using sparse autoencoders PDF

Cannot Refute

[69] The Geometry of Concepts: Sparse Autoencoder Feature Structure PDF

Cannot Refute

Contribution

Hierarchical Concept Embedding Models (HiCEMs)

[3] Interpretable Hierarchical Concept Reasoning through Attention-Guided Graph Learning PDF

Cannot Refute

[51] Parametric layer erasure through latent semantic oscillation in instruction-tuned language models PDF

Cannot Refute

[52] A Closer Look at the Intervention Procedure of Concept Bottleneck Models PDF

Cannot Refute

[53] Streaming Data Classification Based on Hierarchical Concept Drift and Online Ensemble PDF

Cannot Refute

[54] Antecedents and intervention mechanisms: a multi-level study of R&D team's knowledge hiding behavior PDF

Cannot Refute

[55] I saw, i conceived, i concluded: Progressive concepts as bottlenecks PDF

Cannot Refute

[56] Hierarchical Reinforcement Learning with Targeted Causal Interventions PDF

Cannot Refute

[57] Towards Human-Like Music Intelligence via Concept Alignment PDF

Cannot Refute

[58] Early Risk Prediction with Temporally and Contextually Grounded Clinical Language Processing PDF

Cannot Refute

[59] Advancements to Hindi Dependency Parsing: Semantic Information, Ensembling and PAc PDF

Cannot Refute

Contribution

PseudoKitchens dataset

[73] Rendergan: Generating realistic labeled data PDF

Can Refute

[70] AcuSim: A Synthetic Dataset for Cervicocranial Acupuncture Points Localisation PDF

Cannot Refute

[71] MT-GAN: toward realistic image composition based on spatial features PDF

Cannot Refute

[72] Digital twin of an industrial workstation: A novel method of an auto-labeled data generator using virtual reality for human action recognition in the context of human â¦ PDF

Cannot Refute

[74] Fake it till you make it: face analysis in the wild using synthetic data alone PDF

Cannot Refute

[75] GeoSynth: A Photorealistic Synthetic Indoor Dataset for Scene Understanding PDF

Cannot Refute

[76] Generative adversarial network with spatial attention for face attribute editing PDF

Cannot Refute

[77] Scenenet rgb-d: 5m photorealistic images of synthetic indoor trajectories with ground truth PDF

Cannot Refute

[78] OpenRooms: An Open Framework for Photorealistic Indoor Scene Datasets PDF

Cannot Refute

[79] Labeled faces in the wild: A survey PDF

Cannot Refute

Hierarchical Concept-based Interpretable Models

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[3] Interpretable Hierarchical Concept Reasoning through Attention-Guided Graph Learning PDF

[7] Hierarchical concept discovery models: A concept pyramid scheme PDF

[19] TabCBM: Concept-based Interpretable Neural Networks for Tabular Data PDF

[23] Hierarchical concept bottleneck models for vision and their application to explainable fine classification and tracking PDF

[42] Prototype based classification from hierarchy to fairness PDF

Contribution Analysis

Concept Splitting method for discovering sub-concepts

[60] CRISP: Persistent Concept Unlearning via Sparse Autoencoders PDF

[61] Disentangling dense embeddings with sparse autoencoders PDF

[62] Sparse Autoencoders Find Highly Interpretable Features in Language Models PDF

[63] Learning Multi-Level Features with Matryoshka Sparse Autoencoders PDF

[64] Interpreting CLIP with Hierarchical Sparse Autoencoders PDF

[65] Layer-wise evolution of representations in fine-tuned transformers: Insights from sparse autoencoders PDF

[66] Interpreting CLIP with Sparse Linear Concept Embeddings (SpLiCE) PDF

[67] Learning N: M Fine-grained Structured Sparse Neural Networks From Scratch PDF

[68] Learning biologically relevant features in a pathology foundation model using sparse autoencoders PDF

[69] The Geometry of Concepts: Sparse Autoencoder Feature Structure PDF

Hierarchical Concept Embedding Models (HiCEMs)

[3] Interpretable Hierarchical Concept Reasoning through Attention-Guided Graph Learning PDF

[51] Parametric layer erasure through latent semantic oscillation in instruction-tuned language models PDF

[52] A Closer Look at the Intervention Procedure of Concept Bottleneck Models PDF

[53] Streaming Data Classification Based on Hierarchical Concept Drift and Online Ensemble PDF

[54] Antecedents and intervention mechanisms: a multi-level study of R&D team's knowledge hiding behavior PDF

[55] I saw, i conceived, i concluded: Progressive concepts as bottlenecks PDF

[56] Hierarchical Reinforcement Learning with Targeted Causal Interventions PDF

[57] Towards Human-Like Music Intelligence via Concept Alignment PDF

[58] Early Risk Prediction with Temporally and Contextually Grounded Clinical Language Processing PDF

[59] Advancements to Hindi Dependency Parsing: Semantic Information, Ensembling and PAc PDF

PseudoKitchens dataset

[73] Rendergan: Generating realistic labeled data PDF

[70] AcuSim: A Synthetic Dataset for Cervicocranial Acupuncture Points Localisation PDF

[71] MT-GAN: toward realistic image composition based on spatial features PDF

[72] Digital twin of an industrial workstation: A novel method of an auto-labeled data generator using virtual reality for human action recognition in the context of human â¦ PDF

[74] Fake it till you make it: face analysis in the wild using synthetic data alone PDF

[75] GeoSynth: A Photorealistic Synthetic Indoor Dataset for Scene Understanding PDF

[76] Generative adversarial network with spatial attention for face attribute editing PDF

[77] Scenenet rgb-d: 5m photorealistic images of synthetic indoor trajectories with ground truth PDF

[78] OpenRooms: An Open Framework for Photorealistic Indoor Scene Datasets PDF

[79] Labeled faces in the wild: A survey PDF

Table of Contents

[72] Digital twin of an industrial workstation: A novel method of an auto-labeled data generator using virtual reality for human action recognition in the context of human â¦ PDF