Mechanism of Task-oriented Information Removal in In-context Learning

ICLR 2026 Conference SubmissionAnonymous Authors
Mechanistic InterpretabilityIn-context LearningLarge Language Model
Abstract:

In-context Learning (ICL) is an emerging few-shot learning paradigm based on modern Language Models (LMs), yet its inner mechanism remains unclear. In this paper, we investigate the mechanism through a novel perspective of information removal. Specifically, we demonstrate that in the zero-shot scenario, LMs encode queries into non-selective representations in hidden states containing information for all possible tasks, leading to arbitrary outputs without focusing on the intended task, resulting in near-zero accuracy. Meanwhile, we find that selectively removing specific information from hidden states by a low-rank filter effectively steers LMs toward the intended task. Building on these findings, by measuring the hidden states on carefully designed metrics, we observe that few-shot ICL effectively simulates such task-oriented information removal processes, selectively removing the redundant information from entangled non-selective representations, and improving the output based on the demonstrations, which constitutes a key mechanism underlying ICL. Moreover, we identify essential attention heads inducing the removal operation, termed Denoising Heads, which enables the ablation experiments blocking the information removal operation from the inference, where the ICL accuracy significantly degrades, especially when the correct label is absent from the few-shot demonstrations, confirming both the critical role of the information removal mechanism and denoising heads.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper investigates how in-context learning (ICL) selectively removes task-irrelevant information from hidden states, proposing that few-shot demonstrations simulate a low-rank filtering process that steers models toward intended tasks. It resides in the 'Task-Oriented Information Filtering' leaf under 'Mechanistic Understanding of Information Removal in ICL', which contains only two papers total. This sparse population suggests the mechanistic angle—analyzing internal filtering operations rather than applied unlearning or privacy methods—remains relatively underexplored compared to adjacent branches like machine unlearning (six papers across five leaves) or efficiency techniques (three papers).

The taxonomy tree reveals neighboring research directions: 'Dual Process Learning and In-Context vs In-Weights Strategies' (two papers) examines structural ICL and weight forgetting, while 'Machine Unlearning in Language Models' spans five leaves addressing demonstration-based unlearning, parameter arithmetic, and concept removal. The paper's focus on hidden-state filtering diverges from these by probing internal representations rather than post-hoc knowledge erasure or privacy-preserving synthesis. The 'exclude_note' clarifies that general ICL efficiency methods without mechanistic analysis belong elsewhere, reinforcing that this leaf targets interpretability of filtering operations specifically.

Among 27 candidates examined across three contributions, none yielded refutable prior work. The 'Novel evaluation framework' contribution examined seven candidates with zero refutations; 'ICL mechanism via task-oriented information removal' and 'Identification of Denoising Heads' each examined ten candidates, also with zero refutations. This absence of overlapping claims within the limited search scope—coupled with the sparse two-paper leaf—suggests the mechanistic framing and denoising-head identification may represent relatively fresh angles. However, the search scale (27 candidates, not hundreds) means undiscovered related work in broader ICL interpretability or attention analysis remains possible.

Given the sparse taxonomy leaf and zero refutations across 27 examined candidates, the work appears to occupy a less-crowded niche within ICL research. The mechanistic focus on hidden-state filtering and attention-head roles distinguishes it from applied unlearning or privacy methods, though the limited search scope precludes definitive claims about novelty across the entire interpretability literature. The analysis covers top-K semantic matches and citation expansion but does not exhaustively survey all attention mechanism studies or ICL theory papers.

Taxonomy

Core-task Taxonomy Papers
32
3
Claimed Contributions
27
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: information removal mechanism in in-context learning. The field explores how large language models selectively filter, suppress, or eliminate information during in-context learning, spanning mechanistic interpretations, privacy safeguards, and practical efficiency gains. The taxonomy organizes research into several main branches: mechanistic studies that probe how models internally remove or ignore task-irrelevant signals (e.g., Task Information Removal[0], Unlearnable Algorithms[1]); machine unlearning approaches that erase specific knowledge or behaviors from pretrained models (In-context Unlearning[2], Agentic LLM Unlearning[3]); privacy-preserving methods that protect sensitive demonstration data (Private Prompt Synthesis[4], Privacy Preserving In-context[10]); knowledge editing techniques that update or correct facts via context (Edit Factual Knowledge[27], Concept Unlearning[14]); efficiency and compression strategies that distill or prune context to reduce computational overhead (Compressing Context[6], Demonstration Augmentation[5]); and domain-specific applications ranging from recommendation systems to video understanding. These branches reflect overlapping concerns—privacy often motivates unlearning, while compression can serve both efficiency and information control—yet each emphasizes distinct technical challenges and evaluation criteria. Particularly active lines of work include mechanistic investigations into how transformers filter task-irrelevant cues and unlearning methods that balance forgetting targeted knowledge without catastrophic side effects (Mitigating Excessive Forgetting[20], Forget to Know[21]). Privacy-focused studies grapple with the tension between utility and confidentiality when demonstrations contain sensitive data, while knowledge editing research addresses the challenge of updating facts without retraining. Task Information Removal[0] sits squarely within the mechanistic understanding branch, focusing on task-oriented filtering—how models discern which contextual signals to suppress during inference. This emphasis aligns closely with Unlearnable Algorithms[1], which also examines controlled information suppression, though the latter may prioritize adversarial robustness over interpretability. Compared to broader unlearning efforts like In-context Unlearning[2] or privacy methods such as Private Prompt Synthesis[4], Task Information Removal[0] zeroes in on the internal mechanisms that enable selective attention and filtering, offering a more granular view of how context shapes model behavior without invoking full retraining or encryption.

Claimed Contributions

Novel evaluation framework for task-oriented information removal

The authors introduce a framework that measures information removal in hidden states through two geometric metrics: eccentricity (magnitude of information removal) and covariance flux on Task-Verbalization Subspace (correctness of information removal). This framework is designed to be versatile beyond the ICL scenario.

7 retrieved papers
ICL mechanism via task-oriented information removal

The authors propose that ICL works by selectively removing task-irrelevant information from query representations rather than copying new information. Demonstrations guide the model to filter out redundant information from non-selective representations, steering outputs toward the intended task.

10 retrieved papers
Identification of Denoising Heads

The authors identify a set of attention heads called Denoising Heads that perform task-oriented information removal operations. These heads are shown to be independent from induction heads and critical for ICL performance, especially in unseen label scenarios where ablating them causes accuracy to drop nearly to zero.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Novel evaluation framework for task-oriented information removal

The authors introduce a framework that measures information removal in hidden states through two geometric metrics: eccentricity (magnitude of information removal) and covariance flux on Task-Verbalization Subspace (correctness of information removal). This framework is designed to be versatile beyond the ICL scenario.

Contribution

ICL mechanism via task-oriented information removal

The authors propose that ICL works by selectively removing task-irrelevant information from query representations rather than copying new information. Demonstrations guide the model to filter out redundant information from non-selective representations, steering outputs toward the intended task.

Contribution

Identification of Denoising Heads

The authors identify a set of attention heads called Denoising Heads that perform task-oriented information removal operations. These heads are shown to be independent from induction heads and critical for ICL performance, especially in unseen label scenarios where ablating them causes accuracy to drop nearly to zero.

Mechanism of Task-oriented Information Removal in In-context Learning | Novelty Validation