Pretrain–Test Task Alignment Governs Generalization in In-Context Learning

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

In-Context LearningTask AlignmentSpectral BiasPretrainingLinear Attention

In-context learning (ICL) is a central capability of Transformer models, but the structures in data that enable its emergence and govern its robustness remain poorly understood. In this work, we study how the structure of pretraining tasks governs generalization in ICL. Using a solvable model for ICL of linear regression by linear attention, we derive an exact expression for ICL generalization error in high dimensions under arbitrary pretraining–testing task covariance mismatch. This leads to a new alignment measure that quantifies how much information about the pretraining task distribution is useful for inference at test time. We show that this measure directly predicts ICL performance not only in the solvable model but also in nonlinear Transformers. Our analysis further reveals a tradeoff between specialization and generalization in ICL: depending on task distribution alignment, increasing pretraining task diversity can either improve or harm test performance. Together, these results identify train-test task alignment as a key determinant of generalization in ICL.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper derives an exact expression for in-context learning generalization error under arbitrary task covariance mismatch in a solvable linear regression model, and introduces an alignment measure quantifying how pretraining task information aids test-time inference. It resides in the 'Task Distribution Alignment and Mismatch' leaf, which contains only three papers total, making this a relatively sparse research direction within the broader taxonomy. The sibling papers examine necessary conditions for transfer and diversity effects, but this work uniquely provides closed-form error characterization under explicit train-test misalignment.

The taxonomy reveals that this leaf sits within the 'Distribution Shift and Out-of-Distribution Generalization' branch, which also includes input-level covariate shifts, compositional generalization, and novel task functions. Neighboring branches address theoretical mechanisms (Bayesian interpretations, optimization dynamics) and pretraining design (task diversity thresholds, meta-training). The scope note for this leaf explicitly focuses on task distribution alignment effects, excluding input-level shifts and compositional patterns. The paper's emphasis on task covariance structure and alignment measures directly targets this boundary, connecting theoretical foundations to distribution shift phenomena.

Among 25 candidates examined, the first contribution (exact error expression) shows one refutable candidate from five examined, suggesting some prior theoretical work on error characterization exists but may differ in scope or assumptions. The second contribution (alignment measure) examined ten candidates with none clearly refuting it, indicating potential novelty in how alignment is quantified. The third contribution (task alignment as key determinant) also examined ten candidates without clear refutation. The limited search scope means these statistics reflect top semantic matches rather than exhaustive coverage, so unexamined work may exist.

Given the sparse taxonomy leaf and limited refutation among 25 examined candidates, the work appears to occupy a relatively underexplored niche within in-context learning theory. The exact error derivation faces some prior overlap, while the alignment measure and its predictive role seem less directly anticipated. The analysis is constrained by the search scope and cannot rule out relevant work outside the top-25 semantic matches or beyond the citation network examined.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Generalization in in-context learning under task distribution mismatch. The field examines how models trained on one set of tasks can adapt to new tasks presented in-context, particularly when test-time tasks differ from those seen during pretraining. The taxonomy reveals several complementary perspectives: Theoretical Foundations and Mechanisms explore what transformers can learn and why in-context learning works (e.g., An explanation of in-context[1], What can transformers learn[25]); Distribution Shift and Out-of-Distribution Generalization investigates robustness when task distributions diverge; Task Diversity and Pretraining Design studies how pretraining choices shape downstream adaptability (e.g., Pretraining task diversity and[27]); Empirical Characterization and Benchmarking provides systematic evaluations; Simple Function Classes as Testbeds uses controlled settings like linear regression; Operator Learning and Scientific Computing applies these ideas to PDEs and scientific problems; Sequential Decision Making and Reinforcement Learning extends the framework to RL domains; Prompting and Demonstration Design optimizes how examples are selected and presented (e.g., Diverse demonstrations improve in-context[16]); Domain-Specific Applications tackles real-world adaptation challenges; and Generalization Theory and Frameworks formalizes when and how generalization occurs. Recent work highlights tensions between in-distribution success and out-of-distribution robustness. Several studies show that while in-context learning can generalize impressively within familiar task families (In-context learning generalizes but[5]), performance often degrades under distribution shift (Out-of-Distribution Generalization of In-Context[7], When can in-context learning[2]). PretrainTest Task Alignment Governs[0] sits squarely within the Distribution Shift branch, specifically addressing Task Distribution Alignment and Mismatch. It emphasizes how alignment between pretraining and test task distributions fundamentally determines generalization quality, contrasting with neighbors like When can in-context learning[2], which characterizes necessary conditions for successful transfer, and Pretraining task diversity and[27], which explores how diversity during pretraining affects downstream robustness. The central open question remains: what structural properties of pretraining distributions enable models to handle novel task families at test time?

Claimed Contributions

Exact expression for ICL generalization error under arbitrary task covariance mismatch

Can Refute

5 retrieved papers

The authors derive a closed-form formula for the in-context learning generalization error of a linear attention model performing linear regression. This formula applies in high dimensions and allows for arbitrary mismatch between the covariance structures of pretraining and test task distributions, generalizing prior work that assumed identical distributions.

5 retrieved papers

Can Refute

Alignment measure quantifying useful pretraining information for test-time inference

10 retrieved papers

The authors introduce a novel alignment measure that captures how much information from the pretraining task distribution is relevant for test-time inference. This measure directly predicts ICL performance in both the solvable linear model and nonlinear Transformers.

10 retrieved papers

Identification of train-test task alignment as key determinant of ICL generalization

10 retrieved papers

The authors establish that the alignment between pretraining and test task distributions is a fundamental factor governing generalization in in-context learning. They reveal a tradeoff between specialization and generalization, showing that increasing pretraining task diversity can either improve or harm test performance depending on task distribution alignment.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[2] When can in-context learning generalize out of task distribution? PDF

Ngampruetikorn, Vudtiwat, Schwab, David J. (2025)

[27] Pretraining task diversity and the emergence of non-bayesian in-context learning for regression PDF

RaventÃ³s, Allan, Paul, Mansheej, Allan Ravent'os, Chen Feng, Mansheej Paul, Ganguli, Surya, Feng Chen, S. Ganguli (2023)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Exact expression for ICL generalization error under arbitrary task covariance mismatch

[60] PretrainâTest Task Alignment Model for In-Context Learning by Linear Attention PDF

Can Refute

[48] In-Context Learning under Distribution Shift: Optimal Attention Temperature for Transformers PDF

Cannot Refute

[58] Noise covariance estimation in multi-task high-dimensional linear models PDF

Cannot Refute

[59] Fine-grained analysis of in-context linear estimation: Data, architecture, and beyond PDF

Cannot Refute

[61] How Data Mixing Shapes In-Context Learning: Asymptotic Equivalence for Transformers with MLPs PDF

Cannot Refute

Contribution

Alignment measure quantifying useful pretraining information for test-time inference

[62] Does visual pretraining help end-to-end reasoning? PDF

Cannot Refute

[63] Feature Alignment and Uniformity for Test Time Adaptation PDF

Cannot Refute

[64] Impact of Pretraining Term Frequencies on Few-Shot Numerical Reasoning PDF

Cannot Refute

[65] Aligning Pretraining for Detection via Object-Level Contrastive Learning PDF

Cannot Refute

[66] Is best-of-n the best of them? coverage, scaling, and optimality in inference-time alignment PDF

Cannot Refute

[67] Gradient Alignment Improves Test-Time Adaptation for Medical Image Segmentation PDF

Cannot Refute

[68] Architecting contextual gradient synthesis for knowledge representation in large language models PDF

Cannot Refute

[69] Test-time training provably improves transformers as in-context learners PDF

Cannot Refute

[70] Wise: Rethinking the knowledge memory for lifelong model editing of large language models PDF

Cannot Refute

[71] Test-time alignment via hypothesis reweighting PDF

Cannot Refute

Contribution

Identification of train-test task alignment as key determinant of ICL generalization

[11] MetaICL: Learning to Learn In Context PDF

Cannot Refute

[22] Trained Transformers Learn Linear Models In-Context PDF

Cannot Refute

[27] Pretraining task diversity and the emergence of non-bayesian in-context learning for regression PDF

Cannot Refute

[51] How do in-context examples affect compositional generalization? PDF

Cannot Refute

[52] From unstructured data to in-context learning: Exploring what tasks can be learned and when PDF

Cannot Refute

[53] Specialization-generalization transition in exemplar-based in-context learning PDF

Cannot Refute

[54] Llms are few-shot in-context low-resource language learners PDF

Cannot Refute

[55] Aligning generalization between humans and machines PDF

Cannot Refute

[56] Before Generation, Align it! A Novel and Effective Strategy for Mitigating Hallucinations in Text-to-SQL Generation PDF

Cannot Refute

[57] Implicit In-context Learning PDF

Cannot Refute

Pretrain–Test Task Alignment Governs Generalization in In-Context Learning

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[2] When can in-context learning generalize out of task distribution? PDF

[27] Pretraining task diversity and the emergence of non-bayesian in-context learning for regression PDF

Contribution Analysis

Exact expression for ICL generalization error under arbitrary task covariance mismatch

[60] PretrainâTest Task Alignment Model for In-Context Learning by Linear Attention PDF

[48] In-Context Learning under Distribution Shift: Optimal Attention Temperature for Transformers PDF

[58] Noise covariance estimation in multi-task high-dimensional linear models PDF

[59] Fine-grained analysis of in-context linear estimation: Data, architecture, and beyond PDF

[61] How Data Mixing Shapes In-Context Learning: Asymptotic Equivalence for Transformers with MLPs PDF

Alignment measure quantifying useful pretraining information for test-time inference

[62] Does visual pretraining help end-to-end reasoning? PDF

[63] Feature Alignment and Uniformity for Test Time Adaptation PDF

[64] Impact of Pretraining Term Frequencies on Few-Shot Numerical Reasoning PDF

[65] Aligning Pretraining for Detection via Object-Level Contrastive Learning PDF

[66] Is best-of-n the best of them? coverage, scaling, and optimality in inference-time alignment PDF

[67] Gradient Alignment Improves Test-Time Adaptation for Medical Image Segmentation PDF

[68] Architecting contextual gradient synthesis for knowledge representation in large language models PDF

[69] Test-time training provably improves transformers as in-context learners PDF

[70] Wise: Rethinking the knowledge memory for lifelong model editing of large language models PDF

[71] Test-time alignment via hypothesis reweighting PDF

Identification of train-test task alignment as key determinant of ICL generalization

[11] MetaICL: Learning to Learn In Context PDF

[22] Trained Transformers Learn Linear Models In-Context PDF

[27] Pretraining task diversity and the emergence of non-bayesian in-context learning for regression PDF

[51] How do in-context examples affect compositional generalization? PDF

[52] From unstructured data to in-context learning: Exploring what tasks can be learned and when PDF

[53] Specialization-generalization transition in exemplar-based in-context learning PDF

[54] Llms are few-shot in-context low-resource language learners PDF

[55] Aligning generalization between humans and machines PDF

[56] Before Generation, Align it! A Novel and Effective Strategy for Mitigating Hallucinations in Text-to-SQL Generation PDF

[57] Implicit In-context Learning PDF

Table of Contents

[60] PretrainâTest Task Alignment Model for In-Context Learning by Linear Attention PDF