Mechanism of Task-oriented Information Removal in In-context Learning

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 7.5 Download Report PDF

Mechanistic InterpretabilityIn-context LearningLarge Language Model

In-context Learning (ICL) is an emerging few-shot learning paradigm based on modern Language Models (LMs), yet its inner mechanism remains unclear. In this paper, we investigate the mechanism through a novel perspective of information removal. Specifically, we demonstrate that in the zero-shot scenario, LMs encode queries into non-selective representations in hidden states containing information for all possible tasks, leading to arbitrary outputs without focusing on the intended task, resulting in near-zero accuracy. Meanwhile, we find that selectively removing specific information from hidden states by a low-rank filter effectively steers LMs toward the intended task. Building on these findings, by measuring the hidden states on carefully designed metrics, we observe that few-shot ICL effectively simulates such task-oriented information removal processes, selectively removing the redundant information from entangled non-selective representations, and improving the output based on the demonstrations, which constitutes a key mechanism underlying ICL. Moreover, we identify essential attention heads inducing the removal operation, termed Denoising Heads, which enables the ablation experiments blocking the information removal operation from the inference, where the ICL accuracy significantly degrades, especially when the correct label is absent from the few-shot demonstrations, confirming both the critical role of the information removal mechanism and denoising heads.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper investigates how in-context learning (ICL) selectively removes task-irrelevant information from hidden states, proposing that few-shot demonstrations simulate a low-rank filtering process that steers models toward intended tasks. It resides in the 'Task-Oriented Information Filtering' leaf under 'Mechanistic Understanding of Information Removal in ICL', which contains only two papers total. This sparse population suggests the mechanistic angle—analyzing internal filtering operations rather than applied unlearning or privacy methods—remains relatively underexplored compared to adjacent branches like machine unlearning (six papers across five leaves) or efficiency techniques (three papers).

The taxonomy tree reveals neighboring research directions: 'Dual Process Learning and In-Context vs In-Weights Strategies' (two papers) examines structural ICL and weight forgetting, while 'Machine Unlearning in Language Models' spans five leaves addressing demonstration-based unlearning, parameter arithmetic, and concept removal. The paper's focus on hidden-state filtering diverges from these by probing internal representations rather than post-hoc knowledge erasure or privacy-preserving synthesis. The 'exclude_note' clarifies that general ICL efficiency methods without mechanistic analysis belong elsewhere, reinforcing that this leaf targets interpretability of filtering operations specifically.

Among 27 candidates examined across three contributions, none yielded refutable prior work. The 'Novel evaluation framework' contribution examined seven candidates with zero refutations; 'ICL mechanism via task-oriented information removal' and 'Identification of Denoising Heads' each examined ten candidates, also with zero refutations. This absence of overlapping claims within the limited search scope—coupled with the sparse two-paper leaf—suggests the mechanistic framing and denoising-head identification may represent relatively fresh angles. However, the search scale (27 candidates, not hundreds) means undiscovered related work in broader ICL interpretability or attention analysis remains possible.

Given the sparse taxonomy leaf and zero refutations across 27 examined candidates, the work appears to occupy a less-crowded niche within ICL research. The mechanistic focus on hidden-state filtering and attention-head roles distinguishes it from applied unlearning or privacy methods, though the limited search scope precludes definitive claims about novelty across the entire interpretability literature. The analysis covers top-K semantic matches and citation expansion but does not exhaustively survey all attention mechanism studies or ICL theory papers.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: information removal mechanism in in-context learning. The field explores how large language models selectively filter, suppress, or eliminate information during in-context learning, spanning mechanistic interpretations, privacy safeguards, and practical efficiency gains. The taxonomy organizes research into several main branches: mechanistic studies that probe how models internally remove or ignore task-irrelevant signals (e.g., Task Information Removal[0], Unlearnable Algorithms[1]); machine unlearning approaches that erase specific knowledge or behaviors from pretrained models (In-context Unlearning[2], Agentic LLM Unlearning[3]); privacy-preserving methods that protect sensitive demonstration data (Private Prompt Synthesis[4], Privacy Preserving In-context[10]); knowledge editing techniques that update or correct facts via context (Edit Factual Knowledge[27], Concept Unlearning[14]); efficiency and compression strategies that distill or prune context to reduce computational overhead (Compressing Context[6], Demonstration Augmentation[5]); and domain-specific applications ranging from recommendation systems to video understanding. These branches reflect overlapping concerns—privacy often motivates unlearning, while compression can serve both efficiency and information control—yet each emphasizes distinct technical challenges and evaluation criteria. Particularly active lines of work include mechanistic investigations into how transformers filter task-irrelevant cues and unlearning methods that balance forgetting targeted knowledge without catastrophic side effects (Mitigating Excessive Forgetting[20], Forget to Know[21]). Privacy-focused studies grapple with the tension between utility and confidentiality when demonstrations contain sensitive data, while knowledge editing research addresses the challenge of updating facts without retraining. Task Information Removal[0] sits squarely within the mechanistic understanding branch, focusing on task-oriented filtering—how models discern which contextual signals to suppress during inference. This emphasis aligns closely with Unlearnable Algorithms[1], which also examines controlled information suppression, though the latter may prioritize adversarial robustness over interpretability. Compared to broader unlearning efforts like In-context Unlearning[2] or privacy methods such as Private Prompt Synthesis[4], Task Information Removal[0] zeroes in on the internal mechanisms that enable selective attention and filtering, offering a more granular view of how context shapes model behavior without invoking full retraining or encryption.

Claimed Contributions

Novel evaluation framework for task-oriented information removal

7 retrieved papers

The authors introduce a framework that measures information removal in hidden states through two geometric metrics: eccentricity (magnitude of information removal) and covariance flux on Task-Verbalization Subspace (correctness of information removal). This framework is designed to be versatile beyond the ICL scenario.

7 retrieved papers

ICL mechanism via task-oriented information removal

10 retrieved papers

The authors propose that ICL works by selectively removing task-irrelevant information from query representations rather than copying new information. Demonstrations guide the model to filter out redundant information from non-selective representations, steering outputs toward the intended task.

10 retrieved papers

Identification of Denoising Heads

10 retrieved papers

The authors identify a set of attention heads called Denoising Heads that perform task-oriented information removal operations. These heads are shown to be independent from induction heads and critical for ICL performance, especially in unseen label scenarios where ablating them causes accuracy to drop nearly to zero.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[1] Unlearnable algorithms for in-context learning PDF

A Muresanu, A Thudi, MR Zhang (2024)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Novel evaluation framework for task-oriented information removal

[52] Layer by Layer: Uncovering Hidden Representations in Language Models PDF

Cannot Refute

[53] Estimating information flow in deep neural networks PDF

Cannot Refute

[54] Information geometry of evolution of neural network parameters while training PDF

Cannot Refute

[55] NoPeek: Information leakage reduction to share activations in distributed deep learning PDF

Cannot Refute

[56] AdaptHAD: Adaptive One-step Hybrid Network for Hyperspectral Anomaly Detection PDF

Cannot Refute

[57] Removing Hidden Information by Geometrical Perturbation in Frequency Domain PDF

Cannot Refute

[59] Information-Aware Optimization for Enhanced Feature Retention in Medical Image Segmentation PDF

Cannot Refute

Contribution

ICL mechanism via task-oriented information removal

[1] Unlearnable algorithms for in-context learning PDF

Cannot Refute

[33] Sub-SA: Strengthen In-context Learning via Submodular Selective Annotation PDF

Cannot Refute

[34] RECOMP: Improving retrieval-augmented LMs with context compression and selective augmentation PDF

Cannot Refute

[35] Finding support examples for in-context learning PDF

Cannot Refute

[36] Rethinking the role of scale for in-context learning: An interpretability-based case study at 66 billion scale PDF

Cannot Refute

[37] Agentic feature augmentation: Unifying selection and generation with teaming, planning, and memories PDF

Cannot Refute

[38] Data Optimization for LLMs: A Survey PDF

Cannot Refute

[39] AutoM3L: An Automated Multimodal Machine Learning Framework with Large Language Models PDF

Cannot Refute

[40] Robust and scalable model editing for large language models PDF

Cannot Refute

[41] Semantic Decomposition and Selective Context Filtering: Text Processing Techniques for Context-Aware NLP-Based Systems PDF

Cannot Refute

Contribution

Identification of Denoising Heads

[42] Spectformer: Frequency and attention is what you need in a vision transformer PDF

Cannot Refute

[43] Efficient Lightweight Image Denoising with Triple Attention Transformer PDF

Cannot Refute

[44] A Hybrid Residual CNN and Multi-Head Self-Attention Network for Denoising Muscle Artifacts in EEG Signals PDF

Cannot Refute

[45] Advanced transformer for high-noise image denoising: Enhanced attention and detail preservation PDF

Cannot Refute

[46] GDAFormer: Gradient-guided dual attention transformer for low-dose CT image denoising PDF

Cannot Refute

[47] T-gsa: Transformer with gaussian-weighted self-attention for speech enhancement PDF

Cannot Refute

[48] Three-dimension spatial-spectral attention transformer for hyperspectral image denoising PDF

Cannot Refute

[49] Multi-dimension Transformer with Attention-based Filtering for Medical Image Segmentation PDF

Cannot Refute

[50] Wavelet Tree Transformer: Multi-Head Attention with Frequency Selective Representation and Interaction for Remote Sensing Object Detection PDF

Cannot Refute

[51] RDB-DINO: An improved end-to-end transformer with refined de-noising and boxes for small-scale ship detection in SAR images PDF

Cannot Refute

Mechanism of Task-oriented Information Removal in In-context Learning

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[1] Unlearnable algorithms for in-context learning PDF

Contribution Analysis

Novel evaluation framework for task-oriented information removal

[52] Layer by Layer: Uncovering Hidden Representations in Language Models PDF

[53] Estimating information flow in deep neural networks PDF

[54] Information geometry of evolution of neural network parameters while training PDF

[55] NoPeek: Information leakage reduction to share activations in distributed deep learning PDF

[56] AdaptHAD: Adaptive One-step Hybrid Network for Hyperspectral Anomaly Detection PDF

[57] Removing Hidden Information by Geometrical Perturbation in Frequency Domain PDF

[59] Information-Aware Optimization for Enhanced Feature Retention in Medical Image Segmentation PDF

ICL mechanism via task-oriented information removal

[1] Unlearnable algorithms for in-context learning PDF

[33] Sub-SA: Strengthen In-context Learning via Submodular Selective Annotation PDF

[34] RECOMP: Improving retrieval-augmented LMs with context compression and selective augmentation PDF

[35] Finding support examples for in-context learning PDF

[36] Rethinking the role of scale for in-context learning: An interpretability-based case study at 66 billion scale PDF

[37] Agentic feature augmentation: Unifying selection and generation with teaming, planning, and memories PDF

[38] Data Optimization for LLMs: A Survey PDF

[39] AutoM3L: An Automated Multimodal Machine Learning Framework with Large Language Models PDF

[40] Robust and scalable model editing for large language models PDF

[41] Semantic Decomposition and Selective Context Filtering: Text Processing Techniques for Context-Aware NLP-Based Systems PDF

Identification of Denoising Heads

[42] Spectformer: Frequency and attention is what you need in a vision transformer PDF

[43] Efficient Lightweight Image Denoising with Triple Attention Transformer PDF

[44] A Hybrid Residual CNN and Multi-Head Self-Attention Network for Denoising Muscle Artifacts in EEG Signals PDF

[45] Advanced transformer for high-noise image denoising: Enhanced attention and detail preservation PDF

[46] GDAFormer: Gradient-guided dual attention transformer for low-dose CT image denoising PDF

[47] T-gsa: Transformer with gaussian-weighted self-attention for speech enhancement PDF

[48] Three-dimension spatial-spectral attention transformer for hyperspectral image denoising PDF

[49] Multi-dimension Transformer with Attention-based Filtering for Medical Image Segmentation PDF

[50] Wavelet Tree Transformer: Multi-Head Attention with Frequency Selective Representation and Interaction for Remote Sensing Object Detection PDF

[51] RDB-DINO: An improved end-to-end transformer with refined de-noising and boxes for small-scale ship detection in SAR images PDF

Table of Contents