IA2: Alignment with ICL Activations improves Supervised Fine-Tuning

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

In Context LearningICLSupervised Fine TuningSFTAdaptation

Supervised Fine-Tuning (SFT) is used to specialize model behavior by training weights to produce intended target responses for queries. In contrast, In-Context Learning (ICL) adapts models during inference with instructions or demonstrations in the prompt. ICL can offer better generalizability and more calibrated responses compared to SFT in data scarce settings, at the cost of more inference compute. In this work, we ask the question: \textit{Can ICL's internal computations be used to improve the qualities of SFT?} We first show that ICL and SFT produce distinct activation patterns, indicating that the two methods achieve adaptation through different functional mechanisms. Motivated by this observation and to use ICL's rich functionality, we introduce \textbf{I}CL \textbf{A}ctivation \textbf{A}lignment (\act), a self-distillation technique which aims to replicate ICL's activation patterns in SFT models and incentivizes ICL-like internal reasoning. Performing \act as a priming step before SFT significantly improves the accuracy and calibration of model outputs, as shown by our extensive empirical results on 12 popular benchmarks and two model families. This finding is not only practically useful, but also offers a conceptual window into the inner mechanics of model adaptation.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces ICL Activation Alignment (IA2), a self-distillation technique that aligns supervised fine-tuning (SFT) activation patterns with those observed during in-context learning (ICL). It resides in the 'ICL-SFT Activation Alignment' leaf, which contains five papers total, indicating a moderately populated niche within the broader 'Activation-Based Alignment Methods' branch. This leaf specifically targets methods that bridge ICL and SFT through internal representation matching, distinguishing it from general alignment approaches that do not explicitly leverage activation-level insights from ICL mechanisms.

The taxonomy reveals neighboring research directions including 'Cross-Modal and Cross-Lingual Activation Alignment' (2 papers) and 'Attention Mechanism Activation Analysis' (2 papers), both exploring activation-based techniques but in different contexts. Parallel branches like 'ICL Demonstration Optimization' (3 papers) and 'ICL Mechanism Understanding' (2 papers) focus on improving or analyzing ICL itself rather than transferring its properties to SFT. The 'Multi-Objective and Preference-Based Alignment' branch (3 papers) and 'Self-Alignment and Minimal Supervision' (3 papers) pursue broader alignment paradigms without the specific activation-level ICL-SFT bridging that defines this work's contribution.

Among 30 candidates examined, the empirical demonstration of ICL-SFT activation divergence shows one refutable candidate from 10 examined, suggesting some prior exploration of activation pattern differences between these paradigms. The IA2 method itself and the two-step training pipeline each examined 10 candidates with zero refutations, indicating these specific technical contributions appear less directly anticipated in the limited search scope. The statistics suggest the core methodological innovation (IA2 as a priming step) may be more novel than the observation that ICL and SFT produce distinct activations, though the search scope remains constrained.

Based on top-30 semantic matches, the work appears to occupy a recognizable but not overcrowded research direction. The taxonomy structure shows this is one of several complementary approaches to leveraging ICL insights for improved alignment, with the activation-level focus providing a distinct angle compared to demonstration optimization or preference learning methods. The limited search scope means broader field coverage or more distant related work may not be fully captured.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Improving supervised fine-tuning through in-context learning activation alignment. The field addresses how to better align language models with desired behaviors by leveraging insights from in-context learning (ICL) mechanisms. The taxonomy reveals several complementary directions: Activation-Based Alignment Methods focus on directly manipulating or aligning internal model representations during training, often drawing connections between ICL dynamics and supervised fine-tuning; In-Context Learning Enhancement and Analysis investigates how to improve ICL itself through better demonstration selection, progressive strategies, and understanding of underlying mechanisms; Alignment Paradigms and Frameworks explore broader methodological approaches including self-alignment, principle-driven methods, and novel training objectives; Parameter-Efficient and Communication-Efficient Methods address scalability through techniques like low-rank adaptation and federated learning; and Domain-Specific Alignment Applications tailor these ideas to specialized contexts such as recommendation systems, healthcare diagnostics, and multilingual settings. Representative works like Unlocking Spell[1] and In-Context Alignment[23] illustrate early efforts to bridge ICL and alignment, while Principle Driven Self-Alignment[7] and VPO[8] exemplify alternative paradigm innovations. A particularly active line of inquiry centers on understanding and exploiting the relationship between ICL's emergent capabilities and supervised fine-tuning's stability. Works like Progressive ICL[20] and Supervised ICL Fine-Tuning[25] explore how structured demonstration strategies can enhance learning, while Reasoning Distillation[5] and Rewards in Context[3] investigate transferring complex reasoning patterns. ICL Activations Alignment[0] sits squarely within the activation-based branch, proposing that aligning internal activations between ICL and fine-tuning regimes can improve model performance. This approach contrasts with nearby methods like Missing Alignment Link[35] and Dual Alignment[37], which may emphasize different aspects of the alignment process or explore alternative bridging mechanisms between pre-training behaviors and task-specific adaptation. The central tension across these branches involves balancing the flexibility of ICL with the efficiency and robustness of fine-tuning, while maintaining interpretability of the underlying alignment mechanisms.

Claimed Contributions

IA2 (ICL Activation Alignment) method

10 retrieved papers

The authors propose IA2, a self-distillation technique that aligns supervised fine-tuning models with the activation patterns produced during in-context learning. This priming step enforces functional alignment with ICL before standard SFT, enabling models to replicate ICL's internal reasoning mechanisms.

10 retrieved papers

Empirical demonstration of ICL-SFT activation divergence

Can Refute

10 retrieved papers

The authors demonstrate empirically that in-context learning and supervised fine-tuning produce different internal activation patterns in language models, revealing that these two adaptation methods operate through distinct functional mechanisms rather than being functionally equivalent.

10 retrieved papers

Can Refute

Two-step SFT training pipeline with IA2 priming

10 retrieved papers

The authors develop a two-step training pipeline where IA2 priming is performed before standard SFT. This pipeline significantly improves both accuracy and calibration of adapted models across 12 benchmarks and two model families, demonstrating practical benefits of functional alignment with ICL.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[1] The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning PDF

Lin, Bill Yuchen, Ravichander, Abhilasha, Bill Yuchen Lin, Lu Ximing, Abhilasha Ravichander, Dziri, Nouha, Ximing Lu, Sclar, Melanie, Nouha Dziri, Chandu, Khyathi, Melanie Sclar, Bhagavatula, Chandra, Khyathi Raghavi Chandu, Choi, Yejin, Chandra Bhagavatula, Yejin Choi (2023)

[20] Take off the training wheels! progressive in-context learning for effective alignment PDF

Chen Yibin, Hu, Xinshuo, Baotian, Li Dongfang, Liu Zhen-yu, Zhang Min, Zhao Xin-ping (2024)

[35] The Missing Alignment Link of In-context Learning on Sequences PDF

H Agarwal, S Sarawagi (0)

[37] Beyond Simple Matching: Dual Alignment for Improved In-Context Learning PDF

J Zhang, Y Hu, T Liu, B Liu, W Liu, J Li (0)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

IA2 (ICL Activation Alignment) method

[6] Secokd: Aligning large language models for in-context learning with fewer shots PDF

Cannot Refute

[56] Cartridges: Lightweight and general-purpose long context representations via self-study PDF

Cannot Refute

[57] MÂ²IV: Towards efficient and fine-grained multimodal in-context learning via representation engineering PDF

Cannot Refute

[58] Diffusion self-distillation for zero-shot customized image generation PDF

Cannot Refute

[59] Seed-tts: A family of high-quality versatile speech generation models PDF

Cannot Refute

[60] Empowering Compact Language Models with Knowledge Distillation PDF

Cannot Refute

[61] Decoupled Global-Local Alignment for Improving Compositional Understanding PDF

Cannot Refute

[62] In-Context Learning Distillation for Efficient Few-Shot Fine-Tuning PDF

Cannot Refute

[63] Knowledge distillation across vision and language PDF

Cannot Refute

[64] Generative Context-Aware Fine-Tuning of Self-Supervised Speech Models PDF

Cannot Refute

Contribution

Empirical demonstration of ICL-SFT activation divergence

[48] Exploring the relationship between in-context learning and instruction tuning PDF

Can Refute

[13] Improving Multilingual Language Models by Aligning Representations through Steering PDF

Cannot Refute

[14] Large language models capsule: A research analysis of In-Context Learning (ICL) and Parameter-Efficient Fine-Tuning (PEFT) methods PDF

Cannot Refute

[49] Deeper insights without updates: The power of in-context learning over fine-tuning PDF

Cannot Refute

[50] Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning PDF

Cannot Refute

[51] Editing Across Languages: A Survey of Multilingual Knowledge Editing PDF

Cannot Refute

[52] Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning PDF

Cannot Refute

[53] Belief Dynamics Reveal the Dual Nature of In-Context Learning and Activation Steering PDF

Cannot Refute

[54] Improving the Steerability of LLMs in Resource-Constrained Environments PDF

Cannot Refute

[55] Convergence of Spectral Principal Paths: How Deep Networks Distill Linear Representations from Noisy Inputs PDF

Cannot Refute

Contribution

Two-step SFT training pipeline with IA2 priming

[38] A systematic survey of prompt engineering in large language models: Techniques and applications PDF

Cannot Refute

[39] Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes PDF

Cannot Refute

[40] Linguistic Calibration of Language Models PDF

Cannot Refute

[41] Towards effective identification of attack techniques in cyber threat intelligence reports using large language models PDF

Cannot Refute

[42] Emergence of Linear Truth Encodings in Language Models PDF

Cannot Refute

[43] Multi-Stage LLM Fine-Tuning with a Continual Learning Setting PDF

Cannot Refute

[44] Emergent hierarchical reasoning in llms through reinforcement learning PDF

Cannot Refute

[45] MindMerger: Efficient Boosting LLM Reasoning in non-English Languages PDF

Cannot Refute

[46] Bridging Numerical Reasoning and Headline Generation for Enhanced Language Models PDF

Cannot Refute

[47] Linguistic calibration of long-form generations PDF

Cannot Refute

IA2: Alignment with ICL Activations improves Supervised Fine-Tuning

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[1] The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning PDF

[20] Take off the training wheels! progressive in-context learning for effective alignment PDF

[35] The Missing Alignment Link of In-context Learning on Sequences PDF

[37] Beyond Simple Matching: Dual Alignment for Improved In-Context Learning PDF

Contribution Analysis

IA2 (ICL Activation Alignment) method

[6] Secokd: Aligning large language models for in-context learning with fewer shots PDF

[56] Cartridges: Lightweight and general-purpose long context representations via self-study PDF

[57] MÂ²IV: Towards efficient and fine-grained multimodal in-context learning via representation engineering PDF

[58] Diffusion self-distillation for zero-shot customized image generation PDF

[59] Seed-tts: A family of high-quality versatile speech generation models PDF

[60] Empowering Compact Language Models with Knowledge Distillation PDF

[61] Decoupled Global-Local Alignment for Improving Compositional Understanding PDF

[62] In-Context Learning Distillation for Efficient Few-Shot Fine-Tuning PDF

[63] Knowledge distillation across vision and language PDF

[64] Generative Context-Aware Fine-Tuning of Self-Supervised Speech Models PDF

Empirical demonstration of ICL-SFT activation divergence

[48] Exploring the relationship between in-context learning and instruction tuning PDF

[13] Improving Multilingual Language Models by Aligning Representations through Steering PDF

[14] Large language models capsule: A research analysis of In-Context Learning (ICL) and Parameter-Efficient Fine-Tuning (PEFT) methods PDF

[49] Deeper insights without updates: The power of in-context learning over fine-tuning PDF

[50] Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning PDF

[51] Editing Across Languages: A Survey of Multilingual Knowledge Editing PDF

[52] Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning PDF

[53] Belief Dynamics Reveal the Dual Nature of In-Context Learning and Activation Steering PDF

[54] Improving the Steerability of LLMs in Resource-Constrained Environments PDF

[55] Convergence of Spectral Principal Paths: How Deep Networks Distill Linear Representations from Noisy Inputs PDF

Two-step SFT training pipeline with IA2 priming

[38] A systematic survey of prompt engineering in large language models: Techniques and applications PDF

[39] Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes PDF

[40] Linguistic Calibration of Language Models PDF

[41] Towards effective identification of attack techniques in cyber threat intelligence reports using large language models PDF

[42] Emergence of Linear Truth Encodings in Language Models PDF

[43] Multi-Stage LLM Fine-Tuning with a Continual Learning Setting PDF

[44] Emergent hierarchical reasoning in llms through reinforcement learning PDF

[45] MindMerger: Efficient Boosting LLM Reasoning in non-English Languages PDF

[46] Bridging Numerical Reasoning and Headline Generation for Enhanced Language Models PDF

[47] Linguistic calibration of long-form generations PDF

Table of Contents