Directional Convergence, Benign Overfitting of Gradient Descent in leaky ReLU two-layer Neural Networks
Overview
Overall Novelty Assessment
The paper establishes directional convergence for gradient descent (not just gradient flow) in leaky ReLU two-layer networks and derives classification error bounds revealing a phase transition in benign overfitting. It sits in the 'Gradient Descent Training Dynamics' leaf under 'ReLU and Leaky ReLU Networks', which contains four papers total. This is a moderately populated research direction within the broader taxonomy of 34 papers across the field, indicating focused but not overcrowded attention to gradient descent dynamics in ReLU-type networks.
The taxonomy shows this leaf is one of three under 'ReLU and Leaky ReLU Networks', with sibling leaves examining 'Hinge Loss and Margin Maximization' (three papers) and 'Logistic Loss and Classification' (two papers). Neighboring branches explore 'Convolutional Neural Networks' (four papers) and 'Linear and Smooth Activation Networks' (three papers). The scope_note clarifies this leaf focuses specifically on directional convergence under gradient descent/flow, excluding alternative loss functions. The paper's extension from gradient flow to gradient descent represents a technical advance within this established research direction.
Among 26 candidates examined, the contribution on directional convergence for gradient descent (10 candidates, 0 refutable) and the phase transition discovery (10 candidates, 0 refutable) appear novel within the limited search scope. However, the claim of extending results to 'broader data settings' (6 candidates examined, 2 refutable) shows more substantial prior work overlap. The statistics suggest the first two contributions face less direct competition among the examined candidates, while the data generality claim encounters existing work addressing similar mixture or non-orthogonal data scenarios.
Based on the top-26 semantic matches examined, the technical contributions on gradient descent convergence and phase transitions appear relatively novel, while the data setting extension shows clearer overlap with prior work. The analysis covers a focused subset of the literature; a broader search might reveal additional related work, particularly in the 'Data Characteristics and Noise Models' branch (nine papers) which was not the primary focus of this candidate examination.
Taxonomy
Research Landscape Overview
Claimed Contributions
The authors establish directional convergence of gradient descent for leaky ReLU two-layer neural networks trained on mixture data with exponential loss, providing precise characterization of the convergent direction. This is the first such result for ReLU-type networks under gradient descent, extending beyond prior work limited to gradient flow or nearly orthogonal data.
The authors derive classification error bounds for the convergent direction that reveal a phase transition between weak signal and strong signal regimes. They provide both upper and lower bounds for Gaussian mixtures, showing when benign overfitting occurs or provably fails even with directional convergence.
The authors extend benign overfitting results beyond the nearly orthogonal data regime studied in prior work to general mixture data settings, including polynomially tailed distributions. Their deterministic conditions allow proving benign overfitting with high probability under weaker distributional assumptions than previous sub-Gaussian requirements.
Core Task Comparisons
Comparisons with papers in the same taxonomy category
[3] Benign Overfitting for Two-layer ReLU Networks PDF
[5] Benign overfitting for regression with trained two-layer relu networks PDF
[18] Stable Minima Cannot Overfit in Univariate ReLU Networks: Generalization by Large Step Sizes PDF
Contribution Analysis
Detailed comparisons for each claimed contribution
Directional convergence of gradient descent in leaky ReLU two-layer neural networks
The authors establish directional convergence of gradient descent for leaky ReLU two-layer neural networks trained on mixture data with exponential loss, providing precise characterization of the convergent direction. This is the first such result for ReLU-type networks under gradient descent, extending beyond prior work limited to gradient flow or nearly orthogonal data.
[4] Benign Overfitting in Two-layer ReLU Convolutional Neural Networks PDF
[43] Topological obstruction to the training of shallow ReLU neural networks PDF
[44] Gradient descent on two-layer nets: Margin maximization and simplicity bias PDF
[45] Feature selection and low test error in shallow low-rotation relu networks PDF
[46] Towards understanding learning in neural networks with linear teachers PDF
[47] SGD Learns Over-parameterized Networks that Provably Generalize on Linearly Separable Data PDF
[48] Learning a neuron by a shallow relu network: Dynamics and implicit bias for correlated inputs PDF
[49] Non-Singularity of the Gradient Descent map for Neural Networks with Piecewise Analytic Activations PDF
[50] Training two-layer RELU networks with gradient descent is inconsistent PDF
[51] The Implicit Bias of Minima Stability in Multivariate Shallow ReLU Networks PDF
Classification error bounds revealing phase transition in benign overfitting
The authors derive classification error bounds for the convergent direction that reveal a phase transition between weak signal and strong signal regimes. They provide both upper and lower bounds for Gaussian mixtures, showing when benign overfitting occurs or provably fails even with directional convergence.
[6] Rethinking Benign Overfitting in Two-Layer Neural Networks PDF
[8] Benign overfitting in leaky ReLU networks with moderate input dimension PDF
[16] Unveil benign overfitting for transformer in vision: Training dynamics, convergence, and generalization PDF
[21] Benign Overfitting without Linearity: Neural Network Classifiers Trained by Gradient Descent for Noisy Linear Data PDF
[35] Memorizing without overfitting: Bias, variance, and interpolation in overparameterized models PDF
[36] Universal scaling laws of absorbing phase transitions in artificial deep neural networks PDF
[37] Benign overfitting of non-smooth neural networks beyond lazy training PDF
[38] Benign overfitting in adversarially robust linear classification PDF
[39] Understanding generalization in transformers: Error bounds and training dynamics under benign and harmful overfitting PDF
[40] Benign, tempered, or catastrophic: Toward a refined taxonomy of overfitting PDF
Extension of benign overfitting results to broader data settings
The authors extend benign overfitting results beyond the nearly orthogonal data regime studied in prior work to general mixture data settings, including polynomially tailed distributions. Their deterministic conditions allow proving benign overfitting with high probability under weaker distributional assumptions than previous sub-Gaussian requirements.