Predicting Kernel Regression Learning Curves from Only Raw Data Statistics

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.4 Download Report PDF

kernelskernel regressionneural tangent kerneleigenstructurelearning curvesnatural dataMLPs

We study kernel regression with common rotation-invariant kernels on real datasets including CIFAR-5m, SVHN, and ImageNet. We give a theoretical framework that predicts learning curves (test risk vs. sample size) from only two measurements: the empirical data covariance matrix and an empirical polynomial decomposition of the target function $f_*$ . The key new idea is an analytical approximation of a kernel’s eigenvalues and eigenfunctions with respect to an anisotropic data distribution. The eigenfunctions resemble Hermite polynomials of the data, so we call this approximation the \textit{Hermite eigenstructure ansatz} (HEA). We prove the HEA for Gaussian data, but we find that real image data is often ``Gaussian enough’’ for the HEA to hold well in practice, enabling us to predict learning curves by applying prior results relating kernel eigenstructure to test risk. Extending beyond kernel regression, we empirically find that MLPs in the feature-learning regime learn Hermite polynomials in the order predicted by the HEA. Our HEA framework is a proof of concept that an end-to-end theory of learning which maps dataset structure all the way to model performance is possible for nontrivial learning algorithms on real datasets.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces the Hermite eigenstructure ansatz (HEA) to predict kernel regression learning curves from empirical data covariance and polynomial decompositions of the target function. It resides in the 'Kernel Eigenstructure and Data Distribution Modeling' leaf, which contains only two papers total. This leaf sits within the broader 'Theoretical Foundations of Kernel Regression Learning Curves' branch, indicating a relatively sparse research direction focused on mechanistic prediction frameworks rather than purely asymptotic or statistical mechanics approaches. The small sibling count suggests this specific angle—deriving eigenstructure approximations for anisotropic real-world data—is not yet crowded.

The taxonomy reveals neighboring leaves in 'Spectral and Statistical Mechanics Approaches' (four papers) and 'Asymptotic and Power-Law Analysis' (four papers), which address generalization error through replica methods or power-law spectral decay assumptions. The original work diverges by proposing an analytical approximation (HEA) tailored to rotation-invariant kernels and anisotropic distributions, rather than relying on asymptotic limits or generic spectral decompositions. The 'Empirical Analysis and Validation' branch (four papers across two leaves) focuses on measuring exponents on benchmarks, whereas this paper aims to predict curves from raw statistics, bridging theory and empirical structure more directly.

Among 27 candidates examined, no contribution was clearly refuted. The HEA for rotation-invariant kernels (7 candidates, 0 refutable) and theoretical proofs for Gaussian data (10 candidates, 0 refutable) appear novel within the limited search scope. The learning curve prediction framework (10 candidates, 0 refutable) also shows no substantial prior overlap. The analysis does not claim exhaustive coverage—only that top-K semantic matches and citation expansion yielded no direct precedents. The sparse taxonomy leaf and zero refutations suggest the HEA concept and its application to real image data are relatively unexplored in the examined literature.

Given the limited search scale (27 candidates) and the paper's placement in a two-paper leaf, the work appears to occupy a distinct niche within kernel regression theory. The taxonomy structure indicates that while spectral and asymptotic methods are established, mechanistic prediction from data statistics via Hermite approximations is less developed. Acknowledging the search scope, the analysis suggests the HEA framework and its empirical validation on real datasets represent a substantive contribution, though a broader literature review might reveal related ideas in adjacent fields not captured by the current taxonomy.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: predicting kernel regression learning curves from data statistics. The field organizes around several complementary perspectives. Theoretical Foundations examine how kernel eigenstructure and data distribution shape asymptotic behavior, often drawing on statistical mechanics and spectral analysis to characterize generalization as sample size grows. Empirical Analysis and Validation test these predictions against real datasets, documenting power-law decay and other scaling phenomena. Algorithmic Extensions explore optimization strategies and adaptive methods that exploit learning curve structure, while Kernel Design and Selection address how kernel choice influences curve shape. Applied Forecasting and Regression demonstrate these ideas in domains ranging from energy prediction to load forecasting, and Related Statistical and Machine Learning Methods connect kernel regression to broader themes in nonparametric estimation and neural scaling laws. Recent work highlights tension between universal scaling principles and task-specific structure. Studies like Functional Scaling Laws[3] and Scaling Laws Redundancy[6] investigate how data redundancy and functional form govern asymptotic rates, while Spectral Bias Task Alignment[1] and Spectrum Dependent Curves[14] emphasize that eigenspectrum alignment between kernel and target determines convergence speed. Predicting Kernel Learning Curves[0] sits within the theoretical branch focused on kernel eigenstructure and data distribution modeling, closely aligned with Comprehensive Learning Curve Analysis[12], which also examines how distributional properties drive predictive accuracy. Compared to purely empirical approaches like Power Law Decay[5], the original work emphasizes deriving curve predictions directly from statistical summaries of the data, offering a more mechanistic view of how sample complexity unfolds in kernel methods.

Claimed Contributions

Hermite eigenstructure ansatz (HEA) for rotation-invariant kernels

7 retrieved papers

The authors introduce an analytical approximation that expresses kernel eigenvalues and eigenfunctions in terms of Hermite polynomials of the data. This ansatz depends only on the empirical data covariance matrix and kernel level coefficients, enabling prediction of kernel eigenstructure without constructing or diagonalizing kernel matrices.

7 retrieved papers

Theoretical proofs of HEA for Gaussian data

10 retrieved papers

The authors formally prove that the Hermite eigenstructure ansatz holds exactly for Gaussian data distributions in two limiting regimes: for wide Gaussian kernels and for dot-product kernels with fast-decaying level coefficients. These theorems provide rigorous justification for when the ansatz is valid.

10 retrieved papers

Learning curve prediction framework from raw data statistics

10 retrieved papers

The authors develop an end-to-end framework that maps minimal dataset statistics directly to kernel regression performance predictions. By combining the HEA with existing kernel eigenframework results, they predict test error curves using only data covariance and target function decomposition, without requiring kernel matrix construction.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[12] A Comprehensive Analysis on the Learning Curve in Kernel Ridge Regression PDF

David Belius, Tin Sum Cheng, Anastasis Kratsios, Aurelien Lucchi, AurÃ©lien Lucchi (2024)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Hermite eigenstructure ansatz (HEA) for rotation-invariant kernels

[34] Short-time Fourier transform: two fundamental properties and an optimal implementation PDF

Cannot Refute

[35] Polymeromorphic complex ItÃ´-Hermite and Zernike functions: a systematic study, spectral analysis and applications PDF

Cannot Refute

[36] Closed-form expressions for time-frequency operations involving Hermite functions PDF

Cannot Refute

[37] Estimation of spectral distributions of a class of high-dimensional linear processes PDF

Cannot Refute

[38] A solar flux density calculation for a solar tower concentrator using a two-dimensional hermite function expansion PDF

Cannot Refute

[39] Nonorthogonal optical waveguides and resonators PDF

Cannot Refute

[40] Simultaneously band and space limited functions in two dimensions, and receptive fields of visual neurons PDF

Cannot Refute

Contribution

Theoretical proofs of HEA for Gaussian data

[44] Universality of kernel random matrices and kernel regression in the quadratic regime PDF

Cannot Refute

[48] Interlacing eigenvectors of large Gaussian matrices PDF

Cannot Refute

[49] Reconstructing QCD spectral functions with Gaussian processes PDF

Cannot Refute

[50] Spectral Mixture Kernels for Multi-Output Gaussian Processes PDF

Cannot Refute

[51] Gaussian Process Kernels for Pattern Discovery and Extrapolation PDF

Cannot Refute

[52] Ensemble-regularized Kernel density estimation with applications to the ensemble Gaussian mixture filter PDF

Cannot Refute

[53] Generalized Spectral Kernels PDF

Cannot Refute

[54] Asymptotic Gaussian Fluctuations of Eigenvectors in Spectral Clustering PDF

Cannot Refute

[55] Informed Spectral Normalized Gaussian Processes for Trajectory Prediction PDF

Cannot Refute

[56] The Distribution of the Largest Eigenvalue in the Gaussian Ensembles PDF

Cannot Refute

Contribution

Learning curve prediction framework from raw data statistics

[2] Asymptotic learning curves of kernel methods: empirical data versus teacherâstudent paradigm PDF

Cannot Refute

[6] Scaling laws are redundancy laws PDF

Cannot Refute

[12] A Comprehensive Analysis on the Learning Curve in Kernel Ridge Regression PDF

Cannot Refute

[41] Generalization error rates in kernel regression: The crossover from the noiseless to noisy regime PDF

Cannot Refute

[42] Pseudo-labeling for Kernel Ridge Regression under Covariate Shift PDF

Cannot Refute

[43] Package CovRegpy: Regularized covariance regression and forecasting in Python PDF

Cannot Refute

[44] Universality of kernel random matrices and kernel regression in the quadratic regime PDF

Cannot Refute

[45] A theory of high dimensional regression with arbitrary correlations between input features and target functions: sample complexity, multiple descent curves and a â¦ PDF

Cannot Refute

[46] Bayesian Kernel Regression for Functional Data PDF

Cannot Refute

[47] Retrieval of Physical Parameters With Deep Structured Kernel Regression PDF

Cannot Refute

Predicting Kernel Regression Learning Curves from Only Raw Data Statistics

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[12] A Comprehensive Analysis on the Learning Curve in Kernel Ridge Regression PDF

Contribution Analysis

Hermite eigenstructure ansatz (HEA) for rotation-invariant kernels

[34] Short-time Fourier transform: two fundamental properties and an optimal implementation PDF

[35] Polymeromorphic complex ItÃ´-Hermite and Zernike functions: a systematic study, spectral analysis and applications PDF

[36] Closed-form expressions for time-frequency operations involving Hermite functions PDF

[37] Estimation of spectral distributions of a class of high-dimensional linear processes PDF

[38] A solar flux density calculation for a solar tower concentrator using a two-dimensional hermite function expansion PDF

[39] Nonorthogonal optical waveguides and resonators PDF

[40] Simultaneously band and space limited functions in two dimensions, and receptive fields of visual neurons PDF

Theoretical proofs of HEA for Gaussian data

[44] Universality of kernel random matrices and kernel regression in the quadratic regime PDF

[48] Interlacing eigenvectors of large Gaussian matrices PDF

[49] Reconstructing QCD spectral functions with Gaussian processes PDF

[50] Spectral Mixture Kernels for Multi-Output Gaussian Processes PDF

[51] Gaussian Process Kernels for Pattern Discovery and Extrapolation PDF

[52] Ensemble-regularized Kernel density estimation with applications to the ensemble Gaussian mixture filter PDF

[53] Generalized Spectral Kernels PDF

[54] Asymptotic Gaussian Fluctuations of Eigenvectors in Spectral Clustering PDF

[55] Informed Spectral Normalized Gaussian Processes for Trajectory Prediction PDF

[56] The Distribution of the Largest Eigenvalue in the Gaussian Ensembles PDF

Learning curve prediction framework from raw data statistics

[2] Asymptotic learning curves of kernel methods: empirical data versus teacherâstudent paradigm PDF

[6] Scaling laws are redundancy laws PDF

[12] A Comprehensive Analysis on the Learning Curve in Kernel Ridge Regression PDF

[41] Generalization error rates in kernel regression: The crossover from the noiseless to noisy regime PDF

[42] Pseudo-labeling for Kernel Ridge Regression under Covariate Shift PDF

[43] Package CovRegpy: Regularized covariance regression and forecasting in Python PDF

[44] Universality of kernel random matrices and kernel regression in the quadratic regime PDF

[45] A theory of high dimensional regression with arbitrary correlations between input features and target functions: sample complexity, multiple descent curves and a â¦ PDF

[46] Bayesian Kernel Regression for Functional Data PDF

[47] Retrieval of Physical Parameters With Deep Structured Kernel Regression PDF

Table of Contents

[2] Asymptotic learning curves of kernel methods: empirical data versus teacherâstudent paradigm PDF

[45] A theory of high dimensional regression with arbitrary correlations between input features and target functions: sample complexity, multiple descent curves and a â¦ PDF