SiNGER: A Clearer Voice Distills Vision Transformers Further

ICLR 2026 Conference SubmissionAnonymous Authors
Vision foundation modelsmodel compressionknowledge distillationrepresentation learning
Abstract:

Vision Transformers are widely adopted as the backbone of vision foundation models, but they are known to produce high-norm artifacts that degrade representation quality. When knowledge distillation transfers these features to students, high-norm artifacts dominate the objective, so students overfit to artifacts and underweight informative signals, diminishing the gains from larger models. Prior work attempted to remove artifacts but encountered an inherent trade-off between artifact suppression and preserving informative signals from teachers. To address this, we introduce Singular Nullspace-Guided Energy Reallocation (SiNGER), a novel distillation framework that suppresses artifacts while preserving informative signals. The key idea is principled teacher feature refinement: during refinement, we leverage the nullspace-guided perturbation to preserve information while suppressing artifacts. Then, the refined teacher's features are distilled to a student. We implement this perturbation efficiently with a LoRA-based adapter that requires minimal structural modification. Extensive experiments show that \oursname consistently improves student models, achieving state-of-the-art performance in multiple downstream tasks and producing clearer and more interpretable representations.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces SiNGER, a distillation framework that refines teacher features using nullspace-guided perturbations to suppress high-norm artifacts while preserving informative signals. It resides in the 'Feature Refinement for Artifact Suppression' leaf, which contains only two papers total (including this one). This leaf sits within the broader 'Artifact-Aware Distillation Methods' branch, indicating a relatively sparse research direction focused on proactive artifact handling during distillation rather than post-hoc correction.

The taxonomy reveals three main branches: Artifact-Aware Distillation Methods, Artifact Detection and Correction, and Domain-Specific Applications. SiNGER's leaf neighbors include 'Cross-Quality Knowledge Distillation' and 'Efficient Distilled Architectures for Artifact Removal', both addressing artifact challenges through different mechanisms (quality bridging and compact architectures, respectively). The sibling paper in the same leaf (Self-Distilled Registers) also tackles feature refinement, suggesting this specific approach—manipulating teacher representations during distillation—is an emerging but not yet crowded area.

Among 20 candidates examined across three contributions, none were found to clearly refute the proposed methods. The SiNGER framework and LoRA-based adapter each had 10 candidates examined with zero refutable overlaps, while the artifact-induced gradient bias analysis had no candidates examined. This limited search scope suggests the specific combination of nullspace-guided perturbation with LoRA-based refinement appears novel within the examined literature, though the analysis does not cover exhaustive prior work on general distillation or artifact suppression techniques outside the top-20 semantic matches.

Based on the top-20 semantic search results, the work appears to occupy a relatively unexplored intersection of feature refinement and artifact-aware distillation. The sparse taxonomy leaf and absence of refutable candidates suggest novelty, though the limited search scope means potentially relevant work in broader distillation or representation learning may exist outside this analysis. The framework's positioning between proactive refinement and domain-agnostic methods distinguishes it from both post-hoc correction approaches and task-specific adaptations.

Taxonomy

Core-task Taxonomy Papers
10
3
Claimed Contributions
20
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: knowledge distillation for vision transformers with artifact suppression. The field addresses the challenge of transferring knowledge from large vision transformers to compact student models while mitigating visual artifacts that can degrade performance. The taxonomy organizes research into three main branches: Artifact-Aware Distillation Methods, which develop specialized training strategies to reduce artifacts during the distillation process itself; Artifact Detection and Correction, which focuses on identifying and removing artifacts either before or after distillation; and Domain-Specific Applications of Distilled Vision Transformers, which adapt these techniques to specialized imaging domains such as medical imaging, satellite imagery, and deepfake detection. Works like Self-Distilled Registers[1] exemplify feature refinement approaches, while domain applications span histopathology (Histological Knowledge Distillation[2], HistoArtifacts[9]), ophthalmology (Glaucomatous Field Prediction[3]), and remote sensing (Sat-net[4]). A particularly active line of work centers on feature refinement strategies within artifact-aware distillation, where methods aim to suppress spurious patterns introduced during compression. SiNGER[0] sits squarely in this branch alongside Self-Distilled Registers[1], both emphasizing internal feature manipulation to maintain representation quality. In contrast, artifact detection approaches like SEM Artifact Removal[6] and DINO-Detect[10] tackle the problem post-hoc by identifying and correcting defects in generated outputs. Domain-specific applications reveal a tension between general-purpose distillation and task-specific artifact patterns: medical imaging works (Histological Knowledge Distillation[2], Sparse-View CT[8]) must handle domain-unique noise characteristics, while deepfake detection (Deepfake Vision Transformer[7]) requires preserving subtle forensic traces. SiNGER[0] distinguishes itself by focusing on proactive artifact suppression during distillation rather than relying on separate detection or correction stages, positioning it as a refinement-centric approach that complements both detection-based methods and domain-specific adaptations.

Claimed Contributions

SiNGER distillation framework with nullspace-guided perturbation

The authors propose SiNGER, a knowledge distillation framework that refines teacher features by applying perturbations guided toward the left-nullspace of the next block. This approach suppresses high-norm artifacts in Vision Transformers while preserving informative signals, addressing a fundamental trade-off in distillation.

10 retrieved papers
LoRA-based adapter for efficient teacher feature refinement

The authors implement the nullspace-guided perturbation using a lightweight LoRA-based adapter with nullspace initialization. This adapter produces minimal perturbations to teacher features while requiring only 1.2% additional parameters, enabling efficient artifact suppression during distillation.

10 retrieved papers
Analysis of artifact-induced gradient bias in ViT distillation

The authors provide a theoretical and empirical analysis showing that high-norm artifacts in Vision Transformers dominate the distillation objective, causing gradient bias toward outlier tokens. This analysis reveals why students overfit to artifacts and underweight informative signals, motivating their principled refinement approach.

0 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

SiNGER distillation framework with nullspace-guided perturbation

The authors propose SiNGER, a knowledge distillation framework that refines teacher features by applying perturbations guided toward the left-nullspace of the next block. This approach suppresses high-norm artifacts in Vision Transformers while preserving informative signals, addressing a fundamental trade-off in distillation.

Contribution

LoRA-based adapter for efficient teacher feature refinement

The authors implement the nullspace-guided perturbation using a lightweight LoRA-based adapter with nullspace initialization. This adapter produces minimal perturbations to teacher features while requiring only 1.2% additional parameters, enabling efficient artifact suppression during distillation.

Contribution

Analysis of artifact-induced gradient bias in ViT distillation

The authors provide a theoretical and empirical analysis showing that high-norm artifacts in Vision Transformers dominate the distillation objective, causing gradient bias toward outlier tokens. This analysis reveals why students overfit to artifacts and underweight informative signals, motivating their principled refinement approach.