Directional Convergence, Benign Overfitting of Gradient Descent in leaky ReLU two-layer Neural Networks

ICLR 2026 Conference SubmissionAnonymous Authors
Benign overfittingImplicit biasneural networksclassification
Abstract:

In this paper, we provide sufficient conditions of benign overfitting of fixed width leaky ReLU two-layer neural network classifiers trained on mixture data via gradient descent. Our results are derived by establishing directional convergence of the network parameters and classification error bound of the convergent direction. Our classification error bound also lead to the discovery of a newly identified phase transition. Previously, directional convergence in (leaky) ReLU neural networks was established only for gradient flow. Due to the lack of directional convergence, previous results on benign overfitting were limited to those trained on nearly orthogonal data. All of our results hold on mixture data, which is a broader data setting than the nearly orthogonal data setting in prior work. We demonstrate our findings by showing that benign overfitting occurs with high probability in a much wider range of scenarios than previously known. Our results also allow us to characterize cases when benign overfitting provably fails even if directional convergence occurs. Our work thus provides a more complete picture of benign overfitting in leaky ReLU two-layer neural networks.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper establishes directional convergence for gradient descent (not just gradient flow) in leaky ReLU two-layer networks and derives classification error bounds revealing a phase transition in benign overfitting. It sits in the 'Gradient Descent Training Dynamics' leaf under 'ReLU and Leaky ReLU Networks', which contains four papers total. This is a moderately populated research direction within the broader taxonomy of 34 papers across the field, indicating focused but not overcrowded attention to gradient descent dynamics in ReLU-type networks.

The taxonomy shows this leaf is one of three under 'ReLU and Leaky ReLU Networks', with sibling leaves examining 'Hinge Loss and Margin Maximization' (three papers) and 'Logistic Loss and Classification' (two papers). Neighboring branches explore 'Convolutional Neural Networks' (four papers) and 'Linear and Smooth Activation Networks' (three papers). The scope_note clarifies this leaf focuses specifically on directional convergence under gradient descent/flow, excluding alternative loss functions. The paper's extension from gradient flow to gradient descent represents a technical advance within this established research direction.

Among 26 candidates examined, the contribution on directional convergence for gradient descent (10 candidates, 0 refutable) and the phase transition discovery (10 candidates, 0 refutable) appear novel within the limited search scope. However, the claim of extending results to 'broader data settings' (6 candidates examined, 2 refutable) shows more substantial prior work overlap. The statistics suggest the first two contributions face less direct competition among the examined candidates, while the data generality claim encounters existing work addressing similar mixture or non-orthogonal data scenarios.

Based on the top-26 semantic matches examined, the technical contributions on gradient descent convergence and phase transitions appear relatively novel, while the data setting extension shows clearer overlap with prior work. The analysis covers a focused subset of the literature; a broader search might reveal additional related work, particularly in the 'Data Characteristics and Noise Models' branch (nine papers) which was not the primary focus of this candidate examination.

Taxonomy

Core-task Taxonomy Papers
34
3
Claimed Contributions
26
Contribution Candidate Papers Compared
2
Refutable Paper

Research Landscape Overview

Core task: benign overfitting in two-layer neural networks. This field investigates the phenomenon where overparameterized shallow networks interpolate noisy training data yet still generalize well on test data, defying classical statistical intuition. The taxonomy organizes research into several main branches: Activation Function and Architecture Variants explore how different nonlinearities (ReLU, leaky ReLU) and architectural choices influence benign overfitting; Data Characteristics and Noise Models examine the role of label noise, feature structure, and sample complexity; Adversarial Robustness and Security study whether benign overfitting persists under adversarial perturbations; Generalization Theory and Implicit Regularization analyze the implicit biases of gradient-based training that enable good generalization despite interpolation; and Extended Architectures and Generalizations broaden the scope to convolutional networks, transformers, and deeper models. Representative works such as Benign Overfitting ReLU[3] and Benign Overfitting Leaky ReLU[8] illustrate how activation choices shape the training dynamics, while studies like Benign Overfitting Adversarial[2] and Benign Overfitting Noisy Features[26] highlight the interplay between data properties and overfitting behavior. A particularly active line of work focuses on gradient descent dynamics with ReLU-type activations, examining how directional convergence and implicit bias lead networks toward max-margin solutions that generalize despite perfect training fit. Directional Convergence Leaky ReLU[0] sits squarely within this branch, analyzing how leaky ReLU networks trained by gradient descent exhibit directional convergence properties that facilitate benign overfitting. This work closely relates to Benign Overfitting ReLU[3], which establishes foundational results for standard ReLU networks, and contrasts with Benign Overfitting Regression[5], which explores similar phenomena in simpler regression settings without the complexities of nonlinear activations. Meanwhile, other branches investigate whether benign overfitting extends to adversarially robust training or whether it breaks down under distribution shift, revealing trade-offs between interpolation, generalization, and robustness. Open questions remain about the precise conditions under which benign overfitting occurs, the role of initialization and architecture depth, and how these insights scale to practical deep learning scenarios.

Claimed Contributions

Directional convergence of gradient descent in leaky ReLU two-layer neural networks

The authors establish directional convergence of gradient descent for leaky ReLU two-layer neural networks trained on mixture data with exponential loss, providing precise characterization of the convergent direction. This is the first such result for ReLU-type networks under gradient descent, extending beyond prior work limited to gradient flow or nearly orthogonal data.

10 retrieved papers
Classification error bounds revealing phase transition in benign overfitting

The authors derive classification error bounds for the convergent direction that reveal a phase transition between weak signal and strong signal regimes. They provide both upper and lower bounds for Gaussian mixtures, showing when benign overfitting occurs or provably fails even with directional convergence.

10 retrieved papers
Extension of benign overfitting results to broader data settings

The authors extend benign overfitting results beyond the nearly orthogonal data regime studied in prior work to general mixture data settings, including polynomially tailed distributions. Their deterministic conditions allow proving benign overfitting with high probability under weaker distributional assumptions than previous sub-Gaussian requirements.

6 retrieved papers
Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Directional convergence of gradient descent in leaky ReLU two-layer neural networks

The authors establish directional convergence of gradient descent for leaky ReLU two-layer neural networks trained on mixture data with exponential loss, providing precise characterization of the convergent direction. This is the first such result for ReLU-type networks under gradient descent, extending beyond prior work limited to gradient flow or nearly orthogonal data.

Contribution

Classification error bounds revealing phase transition in benign overfitting

The authors derive classification error bounds for the convergent direction that reveal a phase transition between weak signal and strong signal regimes. They provide both upper and lower bounds for Gaussian mixtures, showing when benign overfitting occurs or provably fails even with directional convergence.

Contribution

Extension of benign overfitting results to broader data settings

The authors extend benign overfitting results beyond the nearly orthogonal data regime studied in prior work to general mixture data settings, including polynomially tailed distributions. Their deterministic conditions allow proving benign overfitting with high probability under weaker distributional assumptions than previous sub-Gaussian requirements.