Localizing Task Recognition and Task Learning in In-Context Learning via Attention Head Analysis

ICLR 2026 Conference SubmissionAnonymous Authors
Mechanistic InterpretabilityIn-context LearningLarge Language Model
Abstract:

We investigate the mechanistic underpinnings of in-context learning (ICL) in large language models by reconciling two dominant perspectives: the component-level analysis of attention heads and the holistic decomposition of ICL into Task Recognition (TR) and Task Learning (TL). We propose a novel framework based on Task Subspace Logit Attribution (TSLA) to identify attention heads specialized in TR and TL, and demonstrate their distinct yet complementary roles. Through correlation analysis, ablation studies, and input perturbations, we demonstrate that the identified TR and TL heads independently and effectively capture the TR and TL components of ICL. Via steering experiments with a focus on the geometric analysis of hidden states, we reveal that TR heads promote task recognition through aligning hidden states with the task subspace, while TL heads perform rotations to the hidden states within the subspace towards the correct label to facilitate the correct prediction. We also demonstrate how previous findings in various aspects of ICL's mechanism can be reconciled with our attention-head-level analysis of the TR-TL decomposition of ICL, including induction heads, task vectors, and more. Our framework thus provides a unified and interpretable account of how LLMs execute ICL across diverse tasks and settings.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Taxonomy

Core-task Taxonomy Papers
38
3
Claimed Contributions
0
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: mechanistic analysis of in-context learning through attention head decomposition. This field seeks to understand how transformer models perform in-context learning by dissecting the roles of individual attention heads and internal components. The taxonomy reflects a multifaceted landscape: one major branch examines Attention Head Functional Specialization, cataloging how heads develop distinct roles such as induction (Induction Heads[4]), retrieval (Retrieval Head Factuality[3]), or semantic pattern matching (Semantic Induction Heads[1]). Another branch focuses on Task Recognition and Task Learning Decomposition, exploring how models separate the recognition of task structure from the execution of task-specific computations. Additional branches address Training Dynamics and Emergence, which trace how these mechanisms arise during learning (Learning to Grok[31]), Component-Level Interventions that test causal importance (Causal Head Gating[15]), Theoretical Foundations linking architectures to algorithmic primitives (ICL Architectures Algorithms[13]), Domain-Specific Applications extending insights to vision or robotics (Vision Transformer Interpretability[5], Mechanistic Finetuning VLA[36]), and Survey and Methodological Frameworks that synthesize interpretability techniques (Attention Heads Survey[2]). Several active lines of work reveal contrasting emphases and open questions. Some studies pursue fine-grained localization of function, identifying which heads or neurons are causally responsible for specific behaviors (Which Heads Matter[11], Context-Sensitive Neurons[12]), while others investigate higher-level abstractions such as task representations in hidden states (ICL Representations[20], Hidden State Geometry[24]) or the interplay between task recognition and task execution. The original paper, Task Recognition Localization[0], sits squarely within the Task Recognition and Task Learning Decomposition branch, closely aligned with work that disentangles these two phases at the head level. Its emphasis on localizing task recognition mechanisms complements nearby efforts like Task Information Removal[27], which ablates task-related signals, offering a causal counterpart to the localization perspective. This positioning highlights ongoing debates about whether in-context learning emerges from modular, interpretable circuits or from distributed, entangled representations across layers.

Claimed Contributions

Task Subspace Logit Attribution (TSLA) framework for identifying TR and TL heads

The authors introduce TSLA, a theoretically grounded method that identifies attention heads responsible for Task Recognition and Task Learning in in-context learning by measuring head contributions relative to task-label unembeddings in geometric subspace terms, addressing limitations of prior attribution approaches.

0 retrieved papers
Geometric characterization of TR and TL head mechanisms

Through steering experiments and geometric analysis, the authors demonstrate that TR heads align hidden states to the task-label subspace for label-space recognition, while TL heads perform within-subspace rotations toward correct labels to enable accurate prediction.

0 retrieved papers
Unified framework reconciling component-level and holistic ICL perspectives

The authors establish a unified framework that bridges attention-head-level mechanistic analysis with the holistic Task Recognition and Task Learning decomposition, reconciling prior findings on induction heads and task vectors within this integrated perspective.

0 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Task Subspace Logit Attribution (TSLA) framework for identifying TR and TL heads

The authors introduce TSLA, a theoretically grounded method that identifies attention heads responsible for Task Recognition and Task Learning in in-context learning by measuring head contributions relative to task-label unembeddings in geometric subspace terms, addressing limitations of prior attribution approaches.

Contribution

Geometric characterization of TR and TL head mechanisms

Through steering experiments and geometric analysis, the authors demonstrate that TR heads align hidden states to the task-label subspace for label-space recognition, while TL heads perform within-subspace rotations toward correct labels to enable accurate prediction.

Contribution

Unified framework reconciling component-level and holistic ICL perspectives

The authors establish a unified framework that bridges attention-head-level mechanistic analysis with the holistic Task Recognition and Task Learning decomposition, reconciling prior findings on induction heads and task vectors within this integrated perspective.