Transfer Learning in Infinite Width Feature Learning Networks

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

Transfer LearningInfinite WidthKernel Methods;

We develop a theory of transfer learning in infinitely wide neural networks under gradient flow that quantifies when pretraining on a source task improves generalization on a target task. We analyze both (i) fine-tuning, when the downstream predictor is trained on top of source-induced features and (ii) a jointly rich setting, where both pretraining and downstream tasks can operate in a feature learning regime, but the downstream model is initialized with the features obtained after pre-training. In this setup, the summary statistics of randomly initialized networks after a rich pre-training are adaptive kernels which depend on both source data and labels. For (i), we analyze the performance of a readout for different pretraining data regimes. For (ii), the summary statistics after learning the target task are still adaptive kernels with features from both source and target tasks. We test our theory on linear and polynomial regression tasks as well as real datasets. Our theory allows interpretable conclusions on performance, which depend on the amount of data on both tasks, the alignment between tasks, and the feature learning strength.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper develops a theory of transfer learning in infinitely wide neural networks under gradient flow, analyzing both fine-tuning and jointly rich settings where pretraining induces adaptive kernels. It resides in the 'Tensor Programs and Parametrization Schemes' leaf, which contains only three papers total. This is a relatively sparse research direction within the broader taxonomy of sixteen papers across eleven leaf nodes, suggesting the paper targets a specialized intersection of feature learning theory and transfer learning that has received limited prior attention in the infinite-width literature.

The taxonomy reveals that the paper's immediate neighbors focus on parametrization methods enabling feature learning beyond kernel regimes, while sibling leaves address alternative scaling schemes and depth-dependent hyperparameter transfer. Nearby branches examine optimization dynamics through mean-field or NTK lenses, and application domains including adversarial robustness and computational implementations. The paper bridges feature learning theory with transfer learning applications, a connection less explored in adjacent leaves that either emphasize pure theoretical frameworks or domain-specific empirical studies without the transfer learning focus.

Among twenty-nine candidates examined, the contribution-level statistics show varied novelty profiles. The core theory of transfer learning in infinite-width feature learning networks examined nine candidates with zero refutations, suggesting this formulation is relatively unexplored in the limited search scope. The adaptive kernel characterization after rich pretraining similarly examined ten candidates without refutation. However, the linear toy models with explicit generalization analysis examined ten candidates and found one refutable match, indicating some overlap with prior analytical frameworks in simplified settings, though the transfer learning context may still differentiate the approach.

Based on the limited search of twenty-nine semantically related papers, the work appears to occupy a niche intersection with modest prior coverage. The taxonomy structure confirms sparse activity in this specific leaf, though the single refutation for toy model analysis suggests caution about claiming complete novelty for all technical components. The analysis does not cover exhaustive literature beyond top-K semantic matches, so additional related work may exist outside this scope.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: transfer learning in infinitely wide neural networks under gradient flow. The field structure reflects a multi-faceted investigation into how neural networks behave as width approaches infinity, organized into four main branches. Feature Learning Theory in Infinite-Width Limits examines the mathematical foundations of representation learning and parametrization schemes, including tensor program frameworks that characterize feature evolution. Optimization Dynamics and Convergence Analysis focuses on the training trajectories and convergence guarantees under gradient-based methods, often leveraging neural tangent kernel perspectives or mean-field limits. Application Domains and Empirical Studies explore practical settings such as hyperparameter transfer, neural architecture search, and domain-specific tasks, while Cross-Disciplinary Extensions and Surveys synthesize insights from related areas including convex formulations and expressive power theory. Together, these branches span theoretical rigor, algorithmic development, and empirical validation. A particularly active line of work centers on tensor programs and feature learning regimes, where studies like Tensor Programs Feature Learning[1] and Feature Learning Infinite Width[2] establish how different parametrizations enable or suppress representation learning at infinite width. Transfer Learning Infinite Width[0] sits squarely within this cluster, extending these frameworks to the transfer learning setting and analyzing how pre-trained features evolve under gradient flow. Nearby, Global Convergence Rich Feature[3] investigates convergence properties when networks do learn features, contrasting with kernel-regime analyses such as Neural Tangent Kernel Robustness[4]. Another strand addresses computational and practical concerns, exemplified by Efficient Infinite Width Computation[5] and Depthwise Hyperparameter Transfer[6], which bridge theory and scalable implementation. The central tension across these directions is whether infinite-width limits yield lazy (kernel) or rich (feature-learning) dynamics, and how transfer scenarios modulate this trade-off.

Claimed Contributions

Theory of transfer learning in infinite width feature learning networks

9 retrieved papers

The authors present a theoretical framework for analyzing transfer learning in infinite-width neural networks trained with gradient flow in the mean-field/μP parameterization. This theory characterizes when and how pretraining on a source task benefits generalization on a downstream target task, covering both fine-tuning and jointly rich learning settings.

9 retrieved papers

Adaptive kernel characterization after rich pretraining

10 retrieved papers

The work shows that after feature learning on the source task, the network's behavior can be characterized by adaptive kernels that incorporate information from both the source data and labels. These kernels differ from fixed kernels at initialization and enable analysis of downstream task performance.

10 retrieved papers

Linear toy models with explicit generalization analysis

Can Refute

10 retrieved papers

The authors introduce tractable linear toy models that allow explicit computation of average-case test losses for transfer learning scenarios. These models reveal how data regime, task alignment, and feature learning strength determine whether transfer learning succeeds or fails, including conditions for negative transfer.

10 retrieved papers

Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[1] Tensor programs iv: Feature learning in infinite-width neural networks PDF

Greg Yang, J. Edward Hu (2021)

[2] Feature Learning in Infinite-Width Neural Networks PDF

Yang, Greg, Hu, Edward J., Greg Yang, J. E. Hu (2022)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Theory of transfer learning in infinite width feature learning networks

[2] Feature Learning in Infinite-Width Neural Networks PDF

Cannot Refute

[3] Global Convergence and Rich Feature Learning in -Layer Infinite-Width Neural Networks under P Parametrization PDF

Cannot Refute

[5] Efficient computation of deep nonlinear infinite-width neural networks that learn features PDF

Cannot Refute

[7] Uniform-in-time propagation of chaos for the mean-field gradient Langevin dynamics PDF

Cannot Refute

[8] Over-parameterised shallow neural networks with asymmetrical node scaling: Global convergence guarantees and feature learning PDF

Cannot Refute

[13] Suitability of Modern Neural Networks for Active and Transfer Learning in Surrogate-Assisted Black-Box Optimization PDF

Cannot Refute

[17] On the relationship between neural tangent kernel frobenius distance and distillation sample complexity PDF

Cannot Refute

[18] Continual Learning: Theoretical and Empirical analysis of Infinitely Wide Neural Networks PDF

Cannot Refute

[19] ON THE UNREASONABLE EFFECTIVENESS OF KNOWLEDGE DISTILLATION: ANALYSIS IN THE KERNEL REGIMEâLONG VERSION PDF

Cannot Refute

Contribution

Adaptive kernel characterization after rich pretraining

[30] CoAdapt: Collaborative Adaptation Between Latent EEG Feature Representation and Annotation for Emotion Decoding PDF

Cannot Refute

[31] Data-driven intelligent condition adaptation of feature extraction for bearing fault detection using deep responsible active learning PDF

Cannot Refute

[32] Visual domain adaptation via transfer feature learning PDF

Cannot Refute

[33] Simulation data driven weakly supervised adversarial domain adaptation approach for intelligent cross-machine fault diagnosis PDF

Cannot Refute

[34] Fish feeding intensity assessment method using deep learning-based analysis of feeding splashes PDF

Cannot Refute

[35] Training neural networks as learning data-adaptive kernels: Provable representation and approximation benefits PDF

Cannot Refute

[36] A Data-Adaptive Prior for Bayesian Learning of Kernels in Operators PDF

Cannot Refute

[37] KAFSTExp: Kernel Adaptive Filtering with Nystrom Approximation for Predicting Spatial Gene Expression from Histology Image. PDF

Cannot Refute

[38] Kernel Adaptive Metropolis-Hastings PDF

Cannot Refute

[39] Adaptive kernel graph nonnegative matrix factorization PDF

Cannot Refute

Contribution

Linear toy models with explicit generalization analysis

[29] Fine-tuning can distort pretrained features and underperform out-of-distribution PDF

Can Refute

[20] A deep learning-based framework for automatic brain tumors classification using transfer learning PDF

Cannot Refute

[21] Transfer learning under high-dimensional generalized linear models PDF

Cannot Refute

[22] Logme: Practical assessment of pre-trained models for transfer learning PDF

Cannot Refute

[23] An analytic theory of generalization dynamics and transfer learning in deep linear networks PDF

Cannot Refute

[24] Universality in transfer learning for linear models PDF

Cannot Refute

[25] Trained Transformers Learn Linear Models In-Context PDF

Cannot Refute

[26] A linear adjustment-based approach to posterior drift in transfer learning. PDF

Cannot Refute

[27] Transfer learning PDF

Cannot Refute

[28] Probing the Geometry of Truth: Consistency and Generalization of Truth Directions in LLMs Across Logical Transformations and Question Answering Tasks PDF

Cannot Refute

Transfer Learning in Infinite Width Feature Learning Networks

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[1] Tensor programs iv: Feature learning in infinite-width neural networks PDF

[2] Feature Learning in Infinite-Width Neural Networks PDF

Contribution Analysis

Theory of transfer learning in infinite width feature learning networks

[2] Feature Learning in Infinite-Width Neural Networks PDF

[3] Global Convergence and Rich Feature Learning in -Layer Infinite-Width Neural Networks under P Parametrization PDF

[5] Efficient computation of deep nonlinear infinite-width neural networks that learn features PDF

[7] Uniform-in-time propagation of chaos for the mean-field gradient Langevin dynamics PDF

[8] Over-parameterised shallow neural networks with asymmetrical node scaling: Global convergence guarantees and feature learning PDF

[13] Suitability of Modern Neural Networks for Active and Transfer Learning in Surrogate-Assisted Black-Box Optimization PDF

[17] On the relationship between neural tangent kernel frobenius distance and distillation sample complexity PDF

[18] Continual Learning: Theoretical and Empirical analysis of Infinitely Wide Neural Networks PDF

[19] ON THE UNREASONABLE EFFECTIVENESS OF KNOWLEDGE DISTILLATION: ANALYSIS IN THE KERNEL REGIMEâLONG VERSION PDF

Adaptive kernel characterization after rich pretraining

[30] CoAdapt: Collaborative Adaptation Between Latent EEG Feature Representation and Annotation for Emotion Decoding PDF

[31] Data-driven intelligent condition adaptation of feature extraction for bearing fault detection using deep responsible active learning PDF

[32] Visual domain adaptation via transfer feature learning PDF

[33] Simulation data driven weakly supervised adversarial domain adaptation approach for intelligent cross-machine fault diagnosis PDF

[34] Fish feeding intensity assessment method using deep learning-based analysis of feeding splashes PDF

[35] Training neural networks as learning data-adaptive kernels: Provable representation and approximation benefits PDF

[36] A Data-Adaptive Prior for Bayesian Learning of Kernels in Operators PDF

[37] KAFSTExp: Kernel Adaptive Filtering with Nystrom Approximation for Predicting Spatial Gene Expression from Histology Image. PDF

[38] Kernel Adaptive Metropolis-Hastings PDF

[39] Adaptive kernel graph nonnegative matrix factorization PDF

Linear toy models with explicit generalization analysis

[29] Fine-tuning can distort pretrained features and underperform out-of-distribution PDF

[20] A deep learning-based framework for automatic brain tumors classification using transfer learning PDF

[21] Transfer learning under high-dimensional generalized linear models PDF

[22] Logme: Practical assessment of pre-trained models for transfer learning PDF

[23] An analytic theory of generalization dynamics and transfer learning in deep linear networks PDF

[24] Universality in transfer learning for linear models PDF

[25] Trained Transformers Learn Linear Models In-Context PDF

[26] A linear adjustment-based approach to posterior drift in transfer learning. PDF

[27] Transfer learning PDF

[28] Probing the Geometry of Truth: Consistency and Generalization of Truth Directions in LLMs Across Logical Transformations and Question Answering Tasks PDF

Table of Contents

[19] ON THE UNREASONABLE EFFECTIVENESS OF KNOWLEDGE DISTILLATION: ANALYSIS IN THE KERNEL REGIMEâLONG VERSION PDF