Transfer Learning in Infinite Width Feature Learning Networks

ICLR 2026 Conference SubmissionAnonymous Authors
Transfer LearningInfinite WidthKernel Methods;
Abstract:

We develop a theory of transfer learning in infinitely wide neural networks under gradient flow that quantifies when pretraining on a source task improves generalization on a target task. We analyze both (i) fine-tuning, when the downstream predictor is trained on top of source-induced features and (ii) a jointly rich setting, where both pretraining and downstream tasks can operate in a feature learning regime, but the downstream model is initialized with the features obtained after pre-training. In this setup, the summary statistics of randomly initialized networks after a rich pre-training are adaptive kernels which depend on both source data and labels. For (i), we analyze the performance of a readout for different pretraining data regimes. For (ii), the summary statistics after learning the target task are still adaptive kernels with features from both source and target tasks. We test our theory on linear and polynomial regression tasks as well as real datasets. Our theory allows interpretable conclusions on performance, which depend on the amount of data on both tasks, the alignment between tasks, and the feature learning strength.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper develops a theory of transfer learning in infinitely wide neural networks under gradient flow, analyzing both fine-tuning and jointly rich settings where pretraining induces adaptive kernels. It resides in the 'Tensor Programs and Parametrization Schemes' leaf, which contains only three papers total. This is a relatively sparse research direction within the broader taxonomy of sixteen papers across eleven leaf nodes, suggesting the paper targets a specialized intersection of feature learning theory and transfer learning that has received limited prior attention in the infinite-width literature.

The taxonomy reveals that the paper's immediate neighbors focus on parametrization methods enabling feature learning beyond kernel regimes, while sibling leaves address alternative scaling schemes and depth-dependent hyperparameter transfer. Nearby branches examine optimization dynamics through mean-field or NTK lenses, and application domains including adversarial robustness and computational implementations. The paper bridges feature learning theory with transfer learning applications, a connection less explored in adjacent leaves that either emphasize pure theoretical frameworks or domain-specific empirical studies without the transfer learning focus.

Among twenty-nine candidates examined, the contribution-level statistics show varied novelty profiles. The core theory of transfer learning in infinite-width feature learning networks examined nine candidates with zero refutations, suggesting this formulation is relatively unexplored in the limited search scope. The adaptive kernel characterization after rich pretraining similarly examined ten candidates without refutation. However, the linear toy models with explicit generalization analysis examined ten candidates and found one refutable match, indicating some overlap with prior analytical frameworks in simplified settings, though the transfer learning context may still differentiate the approach.

Based on the limited search of twenty-nine semantically related papers, the work appears to occupy a niche intersection with modest prior coverage. The taxonomy structure confirms sparse activity in this specific leaf, though the single refutation for toy model analysis suggests caution about claiming complete novelty for all technical components. The analysis does not cover exhaustive literature beyond top-K semantic matches, so additional related work may exist outside this scope.

Taxonomy

Core-task Taxonomy Papers
16
3
Claimed Contributions
29
Contribution Candidate Papers Compared
1
Refutable Paper

Research Landscape Overview

Core task: transfer learning in infinitely wide neural networks under gradient flow. The field structure reflects a multi-faceted investigation into how neural networks behave as width approaches infinity, organized into four main branches. Feature Learning Theory in Infinite-Width Limits examines the mathematical foundations of representation learning and parametrization schemes, including tensor program frameworks that characterize feature evolution. Optimization Dynamics and Convergence Analysis focuses on the training trajectories and convergence guarantees under gradient-based methods, often leveraging neural tangent kernel perspectives or mean-field limits. Application Domains and Empirical Studies explore practical settings such as hyperparameter transfer, neural architecture search, and domain-specific tasks, while Cross-Disciplinary Extensions and Surveys synthesize insights from related areas including convex formulations and expressive power theory. Together, these branches span theoretical rigor, algorithmic development, and empirical validation. A particularly active line of work centers on tensor programs and feature learning regimes, where studies like Tensor Programs Feature Learning[1] and Feature Learning Infinite Width[2] establish how different parametrizations enable or suppress representation learning at infinite width. Transfer Learning Infinite Width[0] sits squarely within this cluster, extending these frameworks to the transfer learning setting and analyzing how pre-trained features evolve under gradient flow. Nearby, Global Convergence Rich Feature[3] investigates convergence properties when networks do learn features, contrasting with kernel-regime analyses such as Neural Tangent Kernel Robustness[4]. Another strand addresses computational and practical concerns, exemplified by Efficient Infinite Width Computation[5] and Depthwise Hyperparameter Transfer[6], which bridge theory and scalable implementation. The central tension across these directions is whether infinite-width limits yield lazy (kernel) or rich (feature-learning) dynamics, and how transfer scenarios modulate this trade-off.

Claimed Contributions

Theory of transfer learning in infinite width feature learning networks

The authors present a theoretical framework for analyzing transfer learning in infinite-width neural networks trained with gradient flow in the mean-field/μP parameterization. This theory characterizes when and how pretraining on a source task benefits generalization on a downstream target task, covering both fine-tuning and jointly rich learning settings.

9 retrieved papers
Adaptive kernel characterization after rich pretraining

The work shows that after feature learning on the source task, the network's behavior can be characterized by adaptive kernels that incorporate information from both the source data and labels. These kernels differ from fixed kernels at initialization and enable analysis of downstream task performance.

10 retrieved papers
Linear toy models with explicit generalization analysis

The authors introduce tractable linear toy models that allow explicit computation of average-case test losses for transfer learning scenarios. These models reveal how data regime, task alignment, and feature learning strength determine whether transfer learning succeeds or fails, including conditions for negative transfer.

10 retrieved papers
Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Theory of transfer learning in infinite width feature learning networks

The authors present a theoretical framework for analyzing transfer learning in infinite-width neural networks trained with gradient flow in the mean-field/μP parameterization. This theory characterizes when and how pretraining on a source task benefits generalization on a downstream target task, covering both fine-tuning and jointly rich learning settings.

Contribution

Adaptive kernel characterization after rich pretraining

The work shows that after feature learning on the source task, the network's behavior can be characterized by adaptive kernels that incorporate information from both the source data and labels. These kernels differ from fixed kernels at initialization and enable analysis of downstream task performance.

Contribution

Linear toy models with explicit generalization analysis

The authors introduce tractable linear toy models that allow explicit computation of average-case test losses for transfer learning scenarios. These models reveal how data regime, task alignment, and feature learning strength determine whether transfer learning succeeds or fails, including conditions for negative transfer.