IA2: Alignment with ICL Activations improves Supervised Fine-Tuning

ICLR 2026 Conference SubmissionAnonymous Authors
In Context LearningICLSupervised Fine TuningSFTAdaptation
Abstract:

Supervised Fine-Tuning (SFT) is used to specialize model behavior by training weights to produce intended target responses for queries. In contrast, In-Context Learning (ICL) adapts models during inference with instructions or demonstrations in the prompt. ICL can offer better generalizability and more calibrated responses compared to SFT in data scarce settings, at the cost of more inference compute. In this work, we ask the question: \textit{Can ICL's internal computations be used to improve the qualities of SFT?} We first show that ICL and SFT produce distinct activation patterns, indicating that the two methods achieve adaptation through different functional mechanisms. Motivated by this observation and to use ICL's rich functionality, we introduce \textbf{I}CL \textbf{A}ctivation \textbf{A}lignment (\act), a self-distillation technique which aims to replicate ICL's activation patterns in SFT models and incentivizes ICL-like internal reasoning. Performing \act as a priming step before SFT significantly improves the accuracy and calibration of model outputs, as shown by our extensive empirical results on 12 popular benchmarks and two model families. This finding is not only practically useful, but also offers a conceptual window into the inner mechanics of model adaptation.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces ICL Activation Alignment (IA2), a self-distillation technique that aligns supervised fine-tuning (SFT) activation patterns with those observed during in-context learning (ICL). It resides in the 'ICL-SFT Activation Alignment' leaf, which contains five papers total, indicating a moderately populated niche within the broader 'Activation-Based Alignment Methods' branch. This leaf specifically targets methods that bridge ICL and SFT through internal representation matching, distinguishing it from general alignment approaches that do not explicitly leverage activation-level insights from ICL mechanisms.

The taxonomy reveals neighboring research directions including 'Cross-Modal and Cross-Lingual Activation Alignment' (2 papers) and 'Attention Mechanism Activation Analysis' (2 papers), both exploring activation-based techniques but in different contexts. Parallel branches like 'ICL Demonstration Optimization' (3 papers) and 'ICL Mechanism Understanding' (2 papers) focus on improving or analyzing ICL itself rather than transferring its properties to SFT. The 'Multi-Objective and Preference-Based Alignment' branch (3 papers) and 'Self-Alignment and Minimal Supervision' (3 papers) pursue broader alignment paradigms without the specific activation-level ICL-SFT bridging that defines this work's contribution.

Among 30 candidates examined, the empirical demonstration of ICL-SFT activation divergence shows one refutable candidate from 10 examined, suggesting some prior exploration of activation pattern differences between these paradigms. The IA2 method itself and the two-step training pipeline each examined 10 candidates with zero refutations, indicating these specific technical contributions appear less directly anticipated in the limited search scope. The statistics suggest the core methodological innovation (IA2 as a priming step) may be more novel than the observation that ICL and SFT produce distinct activations, though the search scope remains constrained.

Based on top-30 semantic matches, the work appears to occupy a recognizable but not overcrowded research direction. The taxonomy structure shows this is one of several complementary approaches to leveraging ICL insights for improved alignment, with the activation-level focus providing a distinct angle compared to demonstration optimization or preference learning methods. The limited search scope means broader field coverage or more distant related work may not be fully captured.

Taxonomy

Core-task Taxonomy Papers
37
3
Claimed Contributions
30
Contribution Candidate Papers Compared
1
Refutable Paper

Research Landscape Overview

Core task: Improving supervised fine-tuning through in-context learning activation alignment. The field addresses how to better align language models with desired behaviors by leveraging insights from in-context learning (ICL) mechanisms. The taxonomy reveals several complementary directions: Activation-Based Alignment Methods focus on directly manipulating or aligning internal model representations during training, often drawing connections between ICL dynamics and supervised fine-tuning; In-Context Learning Enhancement and Analysis investigates how to improve ICL itself through better demonstration selection, progressive strategies, and understanding of underlying mechanisms; Alignment Paradigms and Frameworks explore broader methodological approaches including self-alignment, principle-driven methods, and novel training objectives; Parameter-Efficient and Communication-Efficient Methods address scalability through techniques like low-rank adaptation and federated learning; and Domain-Specific Alignment Applications tailor these ideas to specialized contexts such as recommendation systems, healthcare diagnostics, and multilingual settings. Representative works like Unlocking Spell[1] and In-Context Alignment[23] illustrate early efforts to bridge ICL and alignment, while Principle Driven Self-Alignment[7] and VPO[8] exemplify alternative paradigm innovations. A particularly active line of inquiry centers on understanding and exploiting the relationship between ICL's emergent capabilities and supervised fine-tuning's stability. Works like Progressive ICL[20] and Supervised ICL Fine-Tuning[25] explore how structured demonstration strategies can enhance learning, while Reasoning Distillation[5] and Rewards in Context[3] investigate transferring complex reasoning patterns. ICL Activations Alignment[0] sits squarely within the activation-based branch, proposing that aligning internal activations between ICL and fine-tuning regimes can improve model performance. This approach contrasts with nearby methods like Missing Alignment Link[35] and Dual Alignment[37], which may emphasize different aspects of the alignment process or explore alternative bridging mechanisms between pre-training behaviors and task-specific adaptation. The central tension across these branches involves balancing the flexibility of ICL with the efficiency and robustness of fine-tuning, while maintaining interpretability of the underlying alignment mechanisms.

Claimed Contributions

IA2 (ICL Activation Alignment) method

The authors propose IA2, a self-distillation technique that aligns supervised fine-tuning models with the activation patterns produced during in-context learning. This priming step enforces functional alignment with ICL before standard SFT, enabling models to replicate ICL's internal reasoning mechanisms.

10 retrieved papers
Empirical demonstration of ICL-SFT activation divergence

The authors demonstrate empirically that in-context learning and supervised fine-tuning produce different internal activation patterns in language models, revealing that these two adaptation methods operate through distinct functional mechanisms rather than being functionally equivalent.

10 retrieved papers
Can Refute
Two-step SFT training pipeline with IA2 priming

The authors develop a two-step training pipeline where IA2 priming is performed before standard SFT. This pipeline significantly improves both accuracy and calibration of adapted models across 12 benchmarks and two model families, demonstrating practical benefits of functional alignment with ICL.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

IA2 (ICL Activation Alignment) method

The authors propose IA2, a self-distillation technique that aligns supervised fine-tuning models with the activation patterns produced during in-context learning. This priming step enforces functional alignment with ICL before standard SFT, enabling models to replicate ICL's internal reasoning mechanisms.

Contribution

Empirical demonstration of ICL-SFT activation divergence

The authors demonstrate empirically that in-context learning and supervised fine-tuning produce different internal activation patterns in language models, revealing that these two adaptation methods operate through distinct functional mechanisms rather than being functionally equivalent.

Contribution

Two-step SFT training pipeline with IA2 priming

The authors develop a two-step training pipeline where IA2 priming is performed before standard SFT. This pipeline significantly improves both accuracy and calibration of adapted models across 12 benchmarks and two model families, demonstrating practical benefits of functional alignment with ICL.

IA2: Alignment with ICL Activations improves Supervised Fine-Tuning | Novelty Validation