Into the Rabbit Hull: From Task-Relevant Concepts in DINO to Minkowski Geometry

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.5 Download Report PDF

computer visioninterpretability

DINOv2 sees the world well enough to guide robots and segment images, but we still do not know what it sees. We conduct the first comprehensive analysis of DINOv2’s representational structure using overcomplete dictionary learning, extracting over 32,000 visual concepts in what constitutes the largest interpretability demonstration for any vision foundation model to date. This method provides the backbone of our study, which unfolds in three parts.

In the first part, we analyze how different downstream tasks recruit concepts from our learned dictionary, revealing functional specialization: classification exploits “Elsewhere” concepts that fire everywhere except on target objects, implementing learned negations; segmentation relies exclusively on boundary detectors forming coherent subspaces; depth estimation draws on three distinct monocular cue families matching visual neuroscience principles.

Turning to concept geometry and statistics, we find the learned dictionary deviates from ideal near-orthogonal (Grassmannian) structure, exhibiting higher coherence than random baselines. Concept atoms are not aligned with the neuron basis, confirming distributed encoding. We discover antipodal concept pairs that encode opposite semantics (e.g., “white shirt” vs “black shirt”), creating signed semantic axes. Separately, we identify concepts that activate exclusively on register tokens, revealing these encode global scene properties like motion blur and illumination. Across layers, positional information collapses toward a 2D sheet, yet within single images token geometry remains smooth and clustered even after position is removed, putting into question a purely sparse-coding view of representation.

To resolve this paradox, we advance a different view: tokens are formed by combining convex mixtures of a few archetypes (e.g., a rabbit among animals, brown among colors, fluffy among textures). Multi-head attention directly implements this construction, with activations behaving like sums of convex regions. In this picture, concepts are expressed by proximity to landmarks and by regions—not by unbounded linear directions. We call this the Minkowski Representation Hypothesis (MRH), and we examine its empirical signals and consequences for how we study, steer, and interpret vision-transformer representations.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper extracts over 32,000 visual concepts from DINOv2 using overcomplete dictionary learning, positioning itself within the 'Comprehensive Representational Structure Analysis' leaf of the taxonomy. This leaf contains only one paper (the original work itself), indicating a relatively sparse research direction focused on holistic geometric and statistical analysis of foundation model representations. The work sits under the broader 'Representation Analysis and Geometry' branch, which encompasses four distinct approaches to understanding embedding spaces, suggesting this is an emerging rather than saturated area of inquiry.

The taxonomy reveals neighboring leaves examining related but distinct aspects: 'Representation Enhancement' focuses on improving classifiability and robustness, 'Causal Representation Learning' develops theoretical frameworks connecting causal factors to concepts, and 'Viewpoint and Stability Analysis' studies out-of-distribution behavior. The paper's emphasis on functional specialization across tasks (classification, segmentation, depth estimation) and geometric properties (coherence, orthogonality) distinguishes it from these adjacent directions. It also differs from the 'Sparse Autoencoder-Based Concept Discovery' branch by conducting comprehensive structural analysis rather than focusing on SAE architecture design or domain-specific applications.

Among 28 candidates examined across three contributions, no clearly refuting prior work was identified. The 32,000-concept dictionary contribution examined 10 candidates with zero refutations, suggesting substantial scale novelty within the limited search scope. Task-specific recruitment analysis (10 candidates, zero refutations) and the Minkowski Representation Hypothesis (8 candidates, zero refutations) similarly show no direct overlap among examined papers. The statistics indicate that within the top-K semantic matches and citation expansion performed, these contributions appear distinct, though the search scope represents a targeted rather than exhaustive literature review.

Based on the limited search of 28 candidates, the work appears to occupy a relatively unexplored position combining large-scale concept extraction with systematic geometric analysis. The taxonomy structure confirms this is not a crowded research direction, with the paper being the sole occupant of its leaf. However, the analysis cannot rule out relevant work outside the examined candidate set, particularly in adjacent areas like sparse coding theory or neuroscience-inspired representation analysis that may not have surfaced in semantic search.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: interpretability of vision foundation models through concept extraction. The field has organized itself around several complementary strategies for making large vision models more transparent. Sparse Autoencoder-Based Concept Discovery (e.g., Archetypal SAE[4], Monosemantic Features[23]) seeks to decompose learned representations into interpretable units, while Concept Bottleneck Architectures (e.g., Concept Bottleneck Models[8], DCBM[46]) build models that explicitly route predictions through human-understandable concepts. Concept-Based Post-Hoc Explanation methods (e.g., Visual-tcav[20], FastCAV[44]) analyze trained models without modifying their structure, and Transformer-Specific Interpretability focuses on attention mechanisms and token interactions. Vision-Language Model Interpretability (e.g., Visual Interpretability CLIP[32], CBVLM[18]) leverages textual alignment to ground visual features, while Representation Analysis and Geometry examines the underlying structure of embedding spaces. Domain-Specific Foundation Model Applications (e.g., Pathology Foundation Embeddings[1], Retinal Disease Concepts[5]) adapt these techniques to specialized fields, and Prototype-Based Explainability (e.g., ProtoS-ViT[21]) uses exemplar instances to clarify model reasoning. A central tension runs through the field between methods that impose interpretability constraints during training versus those that extract explanations post-hoc. Sparse autoencoder approaches promise fine-grained feature decomposition but face challenges in scaling and ensuring semantic coherence, while concept bottleneck methods trade some predictive flexibility for guaranteed interpretability. Within Representation Analysis and Geometry, Rabbit Hull[0] conducts a comprehensive examination of representational structure, analyzing how embedding spaces organize semantic information across layers and modalities. This work sits alongside efforts like Interpretable Subspaces[7] that identify meaningful directions in latent space, but emphasizes a more holistic structural perspective rather than isolating individual concept vectors. Compared to domain-focused studies like Pathology Foundation Embeddings[1] or Retinal Disease Concepts[5], Rabbit Hull[0] takes a broader view of geometric properties that generalize across vision tasks, contributing foundational insights into how foundation models internally represent visual knowledge.

Claimed Contributions

32,000-concept dictionary for DINOv2 via stable sparse autoencoders

10 retrieved papers

The authors extract a dictionary of 32,000 interpretable concepts from DINOv2 using sparse autoencoders with stability constraints. This represents the largest-scale concept extraction for a vision foundation model and provides the empirical basis for analyzing task-specific concept recruitment and geometric structure.

10 retrieved papers

Task-specific concept recruitment analysis revealing functional specialization

10 retrieved papers

The authors demonstrate that different downstream tasks (classification, segmentation, depth estimation) selectively activate distinct, low-dimensional subsets of the concept space. They identify task-specific patterns such as Elsewhere concepts for classification, border detectors for segmentation, and three families of monocular depth cues.

10 retrieved papers

Minkowski Representation Hypothesis as alternative to linear sparse coding

8 retrieved papers

The authors propose the Minkowski Representation Hypothesis (MRH), which posits that tokens are formed by combining convex mixtures of archetypes rather than unbounded linear directions. They show that multi-head attention naturally implements this geometry through Minkowski sums of convex polytopes, offering an alternative geometric framework to the Linear Representation Hypothesis.

8 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

32,000-concept dictionary for DINOv2 via stable sparse autoencoders

[4] Archetypal SAE: Adaptive and Stable Dictionary Learning for Concept Extraction in Large Vision Models PDF

Cannot Refute

[22] Probing the representational power of sparse autoencoders in vision models PDF

Cannot Refute

[23] Sparse autoencoders learn monosemantic features in vision-language models PDF

Cannot Refute

[36] Universal sparse autoencoders: Interpretable cross-model concept alignment PDF

Cannot Refute

[69] Sparse autoencoders for scientifically rigorous interpretation of vision models PDF

Cannot Refute

[70] From superposition to sparse codes: interpretable representations in neural networks PDF

Cannot Refute

[71] Sparse autoencoders reveal selective remapping of visual concepts during adaptation PDF

Cannot Refute

[72] Interpretable and Testable Vision Features via Sparse Autoencoders PDF

Cannot Refute

[73] Analyzing Hierarchical Structure in Vision Models with Sparse Autoencoders PDF

Cannot Refute

[74] Interpreting Large Text-to-Image Diffusion Models with Dictionary Learning PDF

Cannot Refute

Contribution

Task-specific concept recruitment analysis revealing functional specialization

[28] Concept-centric transformers: Enhancing model interpretability through object-centric concept learning within a shared global workspace PDF

Cannot Refute

[60] Learning Transferable Visual Models From Natural Language Supervision PDF

Cannot Refute

[61] AxiomVision: Accuracy-Guaranteed Adaptive Visual Model Selection for Perspective-Aware Video Analytics PDF

Cannot Refute

[62] DriveMoE: Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving PDF

Cannot Refute

[63] Natural language descriptions of deep visual features PDF

Cannot Refute

[64] Effective and Efficient Few-shot Fine-tuning for Vision Transformers PDF

Cannot Refute

[65] Diverse task-driven modeling of macaque V4 reveals functional specialization towards semantic tasks PDF

Cannot Refute

[66] 3D Weakly Supervised Semantic Segmentation with 2D Vision-Language Guidance PDF

Cannot Refute

[67] Leveraging Vision Language Models for Specialized Agricultural Tasks PDF

Cannot Refute

[68] Top-Down Control of Visual Attention by the Prefrontal Cortex. Functional Specialization and Long-Range Interactions PDF

Cannot Refute

Contribution

Minkowski Representation Hypothesis as alternative to linear sparse coding

[51] Low-Rank Matrix Factorizations with Volume-based Constraints and Regularizations PDF

Cannot Refute

[52] Identifying archetypes by exploiting sparsity of convex representations PDF

Cannot Refute

[53] Archetypal analysis for machine learning PDF

Cannot Refute

[54] Sparse and Archetypal Decomposition Algorithms for Hyperspectral Image Restoration and Spectral Unmixing PDF

Cannot Refute

[56] ASe: Acoustic Scene Embedding Using Deep Archetypal Analysis and GMM PDF

Cannot Refute

[57] Archetypal Analysis and Structured Sparse Representation for Hyperspectral Anomaly Detection PDF

Cannot Refute

[58] Unsupervised Learning of Artistic Styles with Archetypal Style Analysis PDF

Cannot Refute

[59] Conceptual Archetype Decomposition for Interpretable and Generalizable Model Decisions PDF

Cannot Refute

Into the Rabbit Hull: From Task-Relevant Concepts in DINO to Minkowski Geometry

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

Contribution Analysis

32,000-concept dictionary for DINOv2 via stable sparse autoencoders

[4] Archetypal SAE: Adaptive and Stable Dictionary Learning for Concept Extraction in Large Vision Models PDF

[22] Probing the representational power of sparse autoencoders in vision models PDF

[23] Sparse autoencoders learn monosemantic features in vision-language models PDF

[36] Universal sparse autoencoders: Interpretable cross-model concept alignment PDF

[69] Sparse autoencoders for scientifically rigorous interpretation of vision models PDF

[70] From superposition to sparse codes: interpretable representations in neural networks PDF

[71] Sparse autoencoders reveal selective remapping of visual concepts during adaptation PDF

[72] Interpretable and Testable Vision Features via Sparse Autoencoders PDF

[73] Analyzing Hierarchical Structure in Vision Models with Sparse Autoencoders PDF

[74] Interpreting Large Text-to-Image Diffusion Models with Dictionary Learning PDF

Task-specific concept recruitment analysis revealing functional specialization

[28] Concept-centric transformers: Enhancing model interpretability through object-centric concept learning within a shared global workspace PDF

[60] Learning Transferable Visual Models From Natural Language Supervision PDF

[61] AxiomVision: Accuracy-Guaranteed Adaptive Visual Model Selection for Perspective-Aware Video Analytics PDF

[62] DriveMoE: Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving PDF

[63] Natural language descriptions of deep visual features PDF

[64] Effective and Efficient Few-shot Fine-tuning for Vision Transformers PDF

[65] Diverse task-driven modeling of macaque V4 reveals functional specialization towards semantic tasks PDF

[66] 3D Weakly Supervised Semantic Segmentation with 2D Vision-Language Guidance PDF

[67] Leveraging Vision Language Models for Specialized Agricultural Tasks PDF

[68] Top-Down Control of Visual Attention by the Prefrontal Cortex. Functional Specialization and Long-Range Interactions PDF

Minkowski Representation Hypothesis as alternative to linear sparse coding

[51] Low-Rank Matrix Factorizations with Volume-based Constraints and Regularizations PDF

[52] Identifying archetypes by exploiting sparsity of convex representations PDF

[53] Archetypal analysis for machine learning PDF

[54] Sparse and Archetypal Decomposition Algorithms for Hyperspectral Image Restoration and Spectral Unmixing PDF

[56] ASe: Acoustic Scene Embedding Using Deep Archetypal Analysis and GMM PDF

[57] Archetypal Analysis and Structured Sparse Representation for Hyperspectral Anomaly Detection PDF

[58] Unsupervised Learning of Artistic Styles with Archetypal Style Analysis PDF

[59] Conceptual Archetype Decomposition for Interpretable and Generalizable Model Decisions PDF

Table of Contents