Vulcan: Crafting Compact Class-Specific Vision Transformers For Edge Intelligence

ICLR 2026 Conference SubmissionAnonymous Authors
Class-specific model derivationVision Transformerstructured pruningedge intelligence
Abstract:

Large Vision Transformers (ViTs) must often be compressed before they can be deployed on resource-constrained edge devices. However, many edge devices require only part of the all-classes knowledge of a pre-trained ViT in their corresponding application scenarios. This is overlooked by existing compression methods. Lightweight models produced by these methods retain a substantial amount of class-irrelevant knowledge and suffer suboptimal performance on target classes. To address this, we analyze the knowledge distribution of ViT and reveal a knowledge disentanglement within it: neurons in the feed-forward network (FFN) modules encode class-specific knowledge, while the multi-head attention (MHA) modules capture class-agnostic patterns. Building on this insight, we introduce Vulcan, a pruning-oriented post-training method for deriving compact class-specific models from a pre-trained ViT under given resource budgets. Vulcan follows a novel train-then-prune paradigm, which introduces redundancy into ViTs deliberately by collapsing FFN neurons onto those with the highest class-specific activations and by enforcing low-rankness in MHA weights. This design mitigates the irreversible knowledge loss of direct pruning, so that the post-trained model can be compressed into a compact one with negligible performance loss. Notably, the derived edge ViTs not only achieve significant reductions in size and computation but also even surpass the original ViTs in performance on specific classes. Comprehensive experiments with five base ViTs covering three representative visual tasks on four datasets demonstrate that Vulcan-derived ViTs outperform the base ViTs on class-specific tasks by up to 15.12% in accuracy, with only 20%–40% of their sizes. Compared with state-of-the-art structured pruning methods, Vulcan improves class-specific accuracy by up to 13.92%. Code is available at Vulcan.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces Vulcan, a pruning-oriented post-training method for deriving compact class-specific Vision Transformers from pre-trained models. It resides in the Class-Specific and Task-Adaptive Pruning leaf, which contains only two papers (including Vulcan itself and NuWa). This represents a relatively sparse research direction within the broader Pruning-Based Compression branch, suggesting that class-specific adaptation in ViT pruning remains an underexplored area compared to general-purpose structured pruning or token compression methods.

The taxonomy reveals that Vulcan's neighboring research directions include Structured Pruning (two papers), Frequency-Domain Pruning (one paper), and Token Compression methods (four papers across two sub-leaves). While these adjacent areas focus on general-purpose compression or token-level reduction, Vulcan diverges by explicitly targeting class-irrelevant knowledge removal. The broader Compression Techniques branch contains quantization and low-rank methods, but none directly address the class-specific adaptation challenge that Vulcan emphasizes, positioning it at a distinct intersection of pruning and task-aware optimization.

Among the 27 candidates examined through semantic search and citation expansion, none clearly refute Vulcan's three core contributions. The knowledge disentanglement insight (10 candidates examined, 0 refutable) and the Vulcan method itself (10 candidates examined, 0 refutable) appear novel within this limited search scope. The class-centric neuron collapse and truncated nuclear norm regularization (7 candidates examined, 0 refutable) also show no direct prior overlap. However, this analysis is constrained by the search scale and does not constitute an exhaustive literature review.

Based on the top-27 semantic matches and the sparse taxonomy leaf (only one sibling paper), Vulcan appears to occupy a relatively novel position within class-specific ViT compression. The limited number of refutable candidates and the underexplored nature of class-adaptive pruning suggest meaningful originality, though the restricted search scope means potentially relevant work outside these candidates may exist. The knowledge disentanglement insight and train-then-prune paradigm represent the most distinctive contributions within this context.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
27
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: class-specific vision transformer compression for edge deployment. The field addresses the challenge of deploying large Vision Transformers (ViTs) on resource-constrained edge devices by developing methods that reduce model size, computational cost, and memory footprint while preserving accuracy. The taxonomy reveals a rich landscape organized around ten major branches. Compression Techniques encompass pruning-based methods (including class-specific and task-adaptive pruning), quantization, and token reduction strategies such as Efficient Token Compression[4]. Knowledge Distillation and Transfer explores how to transfer learned representations from large teacher models to compact student networks, as seen in works like Manifold Distillation ViT[39]. Distributed and Collaborative Inference investigates partitioning strategies (e.g., Partitioning ViT Edge[2]) and multi-device execution (Multi-Device Transformer Inference[41]). Hardware-Aware Optimization targets specific accelerators and FPGA implementations (FPGA ViT Quantization[46]), while Lightweight Architecture Design focuses on inherently efficient architectures like MicroViT[18] and Lightweight ViT Design[5]. Domain-Specific Applications tailor compression to medical imaging (Medical ViT Deployment[7]) and other specialized tasks, and Training and Adaptation Strategies address parameter-efficient fine-tuning methods such as LoRA ConvMixed-ViT[34]. Several active research directions reveal key trade-offs and open questions. Pruning-based approaches balance granularity (structured versus unstructured) with the need for task or class adaptability, while quantization methods must navigate accuracy-efficiency frontiers across diverse hardware backends. Token compression techniques like those surveyed in Token Compression Survey[21] offer dynamic inference benefits but raise questions about which tokens to retain under varying input conditions. Within this landscape, Vulcan[0] sits in the Class-Specific and Task-Adaptive Pruning cluster, emphasizing tailored compression that adapts pruning decisions to specific classes or tasks. This contrasts with more general pruning frameworks like ViT Hybrid Pruning[16] or broader token-reduction schemes, and aligns closely with NuWa[25], which also explores adaptive strategies. Vulcan's focus on class-specific adaptation addresses a nuanced challenge: ensuring that compression does not disproportionately harm performance on particular categories, a concern particularly relevant for edge deployment where retraining opportunities are limited and diverse workloads are common.

Claimed Contributions

Knowledge disentanglement insight in Vision Transformers

The authors analyze the knowledge distribution within Vision Transformers and discover that feed-forward network (FFN) modules primarily encode class-specific knowledge, while multi-head attention (MHA) modules capture class-agnostic patterns. This insight forms the theoretical foundation for their compression approach.

10 retrieved papers
Vulcan method for deriving compact class-specific ViTs

The authors introduce Vulcan, a pruning-oriented post-training method that derives compact class-specific Vision Transformers from pre-trained models. Vulcan follows a novel train-then-prune paradigm that deliberately introduces redundancy before pruning, minimizing irreversible knowledge loss during compression.

10 retrieved papers
Class-centric neuron collapse and truncated nuclear norm regularization

The authors develop two key technical components: class-centric neuron collapse (CCNC) for FFN modules that collapses neurons onto anchor neurons with highest class-specific activations, and truncated nuclear norm regularization (TNNR) for MHA modules that enforces low-rank structures to enable near-lossless pruning via singular value decomposition.

7 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Knowledge disentanglement insight in Vision Transformers

The authors analyze the knowledge distribution within Vision Transformers and discover that feed-forward network (FFN) modules primarily encode class-specific knowledge, while multi-head attention (MHA) modules capture class-agnostic patterns. This insight forms the theoretical foundation for their compression approach.

Contribution

Vulcan method for deriving compact class-specific ViTs

The authors introduce Vulcan, a pruning-oriented post-training method that derives compact class-specific Vision Transformers from pre-trained models. Vulcan follows a novel train-then-prune paradigm that deliberately introduces redundancy before pruning, minimizing irreversible knowledge loss during compression.

Contribution

Class-centric neuron collapse and truncated nuclear norm regularization

The authors develop two key technical components: class-centric neuron collapse (CCNC) for FFN modules that collapses neurons onto anchor neurons with highest class-specific activations, and truncated nuclear norm regularization (TNNR) for MHA modules that enforces low-rank structures to enable near-lossless pruning via singular value decomposition.