Transfer Learning in Infinite Width Feature Learning Networks
Overview
Overall Novelty Assessment
The paper develops a theory of transfer learning in infinitely wide neural networks under gradient flow, analyzing both fine-tuning and jointly rich settings where pretraining induces adaptive kernels. It resides in the 'Tensor Programs and Parametrization Schemes' leaf, which contains only three papers total. This is a relatively sparse research direction within the broader taxonomy of sixteen papers across eleven leaf nodes, suggesting the paper targets a specialized intersection of feature learning theory and transfer learning that has received limited prior attention in the infinite-width literature.
The taxonomy reveals that the paper's immediate neighbors focus on parametrization methods enabling feature learning beyond kernel regimes, while sibling leaves address alternative scaling schemes and depth-dependent hyperparameter transfer. Nearby branches examine optimization dynamics through mean-field or NTK lenses, and application domains including adversarial robustness and computational implementations. The paper bridges feature learning theory with transfer learning applications, a connection less explored in adjacent leaves that either emphasize pure theoretical frameworks or domain-specific empirical studies without the transfer learning focus.
Among twenty-nine candidates examined, the contribution-level statistics show varied novelty profiles. The core theory of transfer learning in infinite-width feature learning networks examined nine candidates with zero refutations, suggesting this formulation is relatively unexplored in the limited search scope. The adaptive kernel characterization after rich pretraining similarly examined ten candidates without refutation. However, the linear toy models with explicit generalization analysis examined ten candidates and found one refutable match, indicating some overlap with prior analytical frameworks in simplified settings, though the transfer learning context may still differentiate the approach.
Based on the limited search of twenty-nine semantically related papers, the work appears to occupy a niche intersection with modest prior coverage. The taxonomy structure confirms sparse activity in this specific leaf, though the single refutation for toy model analysis suggests caution about claiming complete novelty for all technical components. The analysis does not cover exhaustive literature beyond top-K semantic matches, so additional related work may exist outside this scope.
Taxonomy
Research Landscape Overview
Claimed Contributions
The authors present a theoretical framework for analyzing transfer learning in infinite-width neural networks trained with gradient flow in the mean-field/μP parameterization. This theory characterizes when and how pretraining on a source task benefits generalization on a downstream target task, covering both fine-tuning and jointly rich learning settings.
The work shows that after feature learning on the source task, the network's behavior can be characterized by adaptive kernels that incorporate information from both the source data and labels. These kernels differ from fixed kernels at initialization and enable analysis of downstream task performance.
The authors introduce tractable linear toy models that allow explicit computation of average-case test losses for transfer learning scenarios. These models reveal how data regime, task alignment, and feature learning strength determine whether transfer learning succeeds or fails, including conditions for negative transfer.
Contribution Analysis
Detailed comparisons for each claimed contribution
Theory of transfer learning in infinite width feature learning networks
The authors present a theoretical framework for analyzing transfer learning in infinite-width neural networks trained with gradient flow in the mean-field/μP parameterization. This theory characterizes when and how pretraining on a source task benefits generalization on a downstream target task, covering both fine-tuning and jointly rich learning settings.
[2] Feature Learning in Infinite-Width Neural Networks PDF
[3] Global Convergence and Rich Feature Learning in -Layer Infinite-Width Neural Networks under P Parametrization PDF
[5] Efficient computation of deep nonlinear infinite-width neural networks that learn features PDF
[7] Uniform-in-time propagation of chaos for the mean-field gradient Langevin dynamics PDF
[8] Over-parameterised shallow neural networks with asymmetrical node scaling: Global convergence guarantees and feature learning PDF
[13] Suitability of Modern Neural Networks for Active and Transfer Learning in Surrogate-Assisted Black-Box Optimization PDF
[17] On the relationship between neural tangent kernel frobenius distance and distillation sample complexity PDF
[18] Continual Learning: Theoretical and Empirical analysis of Infinitely Wide Neural Networks PDF
[19] ON THE UNREASONABLE EFFECTIVENESS OF KNOWLEDGE DISTILLATION: ANALYSIS IN THE KERNEL REGIMEâLONG VERSION PDF
Adaptive kernel characterization after rich pretraining
The work shows that after feature learning on the source task, the network's behavior can be characterized by adaptive kernels that incorporate information from both the source data and labels. These kernels differ from fixed kernels at initialization and enable analysis of downstream task performance.
[30] CoAdapt: Collaborative Adaptation Between Latent EEG Feature Representation and Annotation for Emotion Decoding PDF
[31] Data-driven intelligent condition adaptation of feature extraction for bearing fault detection using deep responsible active learning PDF
[32] Visual domain adaptation via transfer feature learning PDF
[33] Simulation data driven weakly supervised adversarial domain adaptation approach for intelligent cross-machine fault diagnosis PDF
[34] Fish feeding intensity assessment method using deep learning-based analysis of feeding splashes PDF
[35] Training neural networks as learning data-adaptive kernels: Provable representation and approximation benefits PDF
[36] A Data-Adaptive Prior for Bayesian Learning of Kernels in Operators PDF
[37] KAFSTExp: Kernel Adaptive Filtering with Nystrom Approximation for Predicting Spatial Gene Expression from Histology Image. PDF
[38] Kernel Adaptive Metropolis-Hastings PDF
[39] Adaptive kernel graph nonnegative matrix factorization PDF
Linear toy models with explicit generalization analysis
The authors introduce tractable linear toy models that allow explicit computation of average-case test losses for transfer learning scenarios. These models reveal how data regime, task alignment, and feature learning strength determine whether transfer learning succeeds or fails, including conditions for negative transfer.