Pretrain–Test Task Alignment Governs Generalization in In-Context Learning

ICLR 2026 Conference SubmissionAnonymous Authors
In-Context LearningTask AlignmentSpectral BiasPretrainingLinear Attention
Abstract:

In-context learning (ICL) is a central capability of Transformer models, but the structures in data that enable its emergence and govern its robustness remain poorly understood. In this work, we study how the structure of pretraining tasks governs generalization in ICL. Using a solvable model for ICL of linear regression by linear attention, we derive an exact expression for ICL generalization error in high dimensions under arbitrary pretraining–testing task covariance mismatch. This leads to a new alignment measure that quantifies how much information about the pretraining task distribution is useful for inference at test time. We show that this measure directly predicts ICL performance not only in the solvable model but also in nonlinear Transformers. Our analysis further reveals a tradeoff between specialization and generalization in ICL: depending on task distribution alignment, increasing pretraining task diversity can either improve or harm test performance. Together, these results identify train-test task alignment as a key determinant of generalization in ICL.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper derives an exact expression for in-context learning generalization error under arbitrary task covariance mismatch in a solvable linear regression model, and introduces an alignment measure quantifying how pretraining task information aids test-time inference. It resides in the 'Task Distribution Alignment and Mismatch' leaf, which contains only three papers total, making this a relatively sparse research direction within the broader taxonomy. The sibling papers examine necessary conditions for transfer and diversity effects, but this work uniquely provides closed-form error characterization under explicit train-test misalignment.

The taxonomy reveals that this leaf sits within the 'Distribution Shift and Out-of-Distribution Generalization' branch, which also includes input-level covariate shifts, compositional generalization, and novel task functions. Neighboring branches address theoretical mechanisms (Bayesian interpretations, optimization dynamics) and pretraining design (task diversity thresholds, meta-training). The scope note for this leaf explicitly focuses on task distribution alignment effects, excluding input-level shifts and compositional patterns. The paper's emphasis on task covariance structure and alignment measures directly targets this boundary, connecting theoretical foundations to distribution shift phenomena.

Among 25 candidates examined, the first contribution (exact error expression) shows one refutable candidate from five examined, suggesting some prior theoretical work on error characterization exists but may differ in scope or assumptions. The second contribution (alignment measure) examined ten candidates with none clearly refuting it, indicating potential novelty in how alignment is quantified. The third contribution (task alignment as key determinant) also examined ten candidates without clear refutation. The limited search scope means these statistics reflect top semantic matches rather than exhaustive coverage, so unexamined work may exist.

Given the sparse taxonomy leaf and limited refutation among 25 examined candidates, the work appears to occupy a relatively underexplored niche within in-context learning theory. The exact error derivation faces some prior overlap, while the alignment measure and its predictive role seem less directly anticipated. The analysis is constrained by the search scope and cannot rule out relevant work outside the top-25 semantic matches or beyond the citation network examined.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
25
Contribution Candidate Papers Compared
1
Refutable Paper

Research Landscape Overview

Core task: Generalization in in-context learning under task distribution mismatch. The field examines how models trained on one set of tasks can adapt to new tasks presented in-context, particularly when test-time tasks differ from those seen during pretraining. The taxonomy reveals several complementary perspectives: Theoretical Foundations and Mechanisms explore what transformers can learn and why in-context learning works (e.g., An explanation of in-context[1], What can transformers learn[25]); Distribution Shift and Out-of-Distribution Generalization investigates robustness when task distributions diverge; Task Diversity and Pretraining Design studies how pretraining choices shape downstream adaptability (e.g., Pretraining task diversity and[27]); Empirical Characterization and Benchmarking provides systematic evaluations; Simple Function Classes as Testbeds uses controlled settings like linear regression; Operator Learning and Scientific Computing applies these ideas to PDEs and scientific problems; Sequential Decision Making and Reinforcement Learning extends the framework to RL domains; Prompting and Demonstration Design optimizes how examples are selected and presented (e.g., Diverse demonstrations improve in-context[16]); Domain-Specific Applications tackles real-world adaptation challenges; and Generalization Theory and Frameworks formalizes when and how generalization occurs. Recent work highlights tensions between in-distribution success and out-of-distribution robustness. Several studies show that while in-context learning can generalize impressively within familiar task families (In-context learning generalizes but[5]), performance often degrades under distribution shift (Out-of-Distribution Generalization of In-Context[7], When can in-context learning[2]). PretrainTest Task Alignment Governs[0] sits squarely within the Distribution Shift branch, specifically addressing Task Distribution Alignment and Mismatch. It emphasizes how alignment between pretraining and test task distributions fundamentally determines generalization quality, contrasting with neighbors like When can in-context learning[2], which characterizes necessary conditions for successful transfer, and Pretraining task diversity and[27], which explores how diversity during pretraining affects downstream robustness. The central open question remains: what structural properties of pretraining distributions enable models to handle novel task families at test time?

Claimed Contributions

Exact expression for ICL generalization error under arbitrary task covariance mismatch

The authors derive a closed-form formula for the in-context learning generalization error of a linear attention model performing linear regression. This formula applies in high dimensions and allows for arbitrary mismatch between the covariance structures of pretraining and test task distributions, generalizing prior work that assumed identical distributions.

5 retrieved papers
Can Refute
Alignment measure quantifying useful pretraining information for test-time inference

The authors introduce a novel alignment measure that captures how much information from the pretraining task distribution is relevant for test-time inference. This measure directly predicts ICL performance in both the solvable linear model and nonlinear Transformers.

10 retrieved papers
Identification of train-test task alignment as key determinant of ICL generalization

The authors establish that the alignment between pretraining and test task distributions is a fundamental factor governing generalization in in-context learning. They reveal a tradeoff between specialization and generalization, showing that increasing pretraining task diversity can either improve or harm test performance depending on task distribution alignment.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Exact expression for ICL generalization error under arbitrary task covariance mismatch

The authors derive a closed-form formula for the in-context learning generalization error of a linear attention model performing linear regression. This formula applies in high dimensions and allows for arbitrary mismatch between the covariance structures of pretraining and test task distributions, generalizing prior work that assumed identical distributions.

Contribution

Alignment measure quantifying useful pretraining information for test-time inference

The authors introduce a novel alignment measure that captures how much information from the pretraining task distribution is relevant for test-time inference. This measure directly predicts ICL performance in both the solvable linear model and nonlinear Transformers.

Contribution

Identification of train-test task alignment as key determinant of ICL generalization

The authors establish that the alignment between pretraining and test task distributions is a fundamental factor governing generalization in in-context learning. They reveal a tradeoff between specialization and generalization, showing that increasing pretraining task diversity can either improve or harm test performance depending on task distribution alignment.