Corner Gradient Descent

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

mini-batch stochastic gradient descentmomentumsampling noiseconvergence ratesaccelerationpower lawsphase diagramcontour integrationrational approximationsasymptotic methodsMNISTfrequency response function

We consider SGD-type optimization on infinite-dimensional quadratic problems with power law spectral conditions. It is well-known that on such problems deterministic GD has loss convergence rates $L_t=O(t^{-\zeta})$ , which can be improved to $L_t=O(t^{-2\zeta})$ by using Heavy Ball with a non-stationary Jacobi-based schedule (and the latter rate is optimal among fixed schedules). However, in the mini-batch Stochastic GD setting, the sampling noise causes the Jacobi HB to diverge; accordingly no $O(t^{-2\zeta})$ algorithm is known. In this paper we show that rates up to $O(t^{-2\zeta})$ can be achieved by a generalized stationary SGD with infinite memory. We start by identifying generalized (S)GD algorithms with contours in the complex plane. We then show that contours that have a corner with external angle $\theta\pi$ accelerate the plain GD rate $O(t^{-\zeta})$ to $O(t^{-\theta\zeta})$ . For deterministic GD, increasing $\theta$ allows to achieve rates arbitrarily close to $O(t^{-2\zeta})$ . However, in Stochastic GD, increasing $\theta$ also amplifies the sampling noise, so in general $\theta$ needs to be optimized by balancing the acceleration and noise effects. We prove that the optimal rate is given by $\theta_{\max}=\min(2,\nu,\tfrac{2}{\zeta+1/\nu})$ , where $\nu,\zeta$ are the exponents appearing in the capacity and source spectral conditions. Furthermore, using fast rational approximations of the power functions, we show that ideal corner algorithms can be efficiently approximated by practical finite-memory algorithms.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes generalized stationary SGD algorithms with infinite memory that achieve convergence rates up to O(t^{-2ζ}) on infinite-dimensional quadratic problems with power-law spectra. It resides in the 'Generalized Stationary SGD with Infinite Memory' leaf under 'Momentum and Acceleration Techniques', where it is currently the sole paper. This placement indicates a sparse research direction within the broader momentum-based acceleration landscape, suggesting the infinite-memory perspective on SGD acceleration is relatively unexplored compared to finite-memory momentum methods or explicit preconditioning approaches.

The taxonomy reveals neighboring work in adjacent leaves: Heavy Ball Momentum Under Gradient Noise addresses anisotropic noise effects, Nesterov Acceleration for Stochastic Settings focuses on interpolation-based guarantees, and Krylov Subspace Acceleration projects gradients onto subspaces. The paper's contour-based framework and corner algorithms diverge from these by framing acceleration through complex-plane geometry rather than classical momentum schedules or subspace projections. The exclude_note clarifies that finite-memory momentum methods belong elsewhere, positioning this work as fundamentally distinct in its use of infinite weighted gradient history to handle power-law spectral conditions.

Among 23 candidates examined across three contributions, none were found to clearly refute the proposed ideas. The contour view examined 10 candidates with no refutations, corner algorithms with acceleration factor θ examined 3 candidates with none refuting, and finite-memory approximations examined 10 candidates with no overlapping prior work identified. This suggests that within the limited search scope—focused on top semantic matches and citation expansion—the specific combination of contour-based analysis, corner geometry for acceleration, and infinite-memory stationary schemes appears novel. The absence of sibling papers in the same taxonomy leaf reinforces that this particular framing has not been extensively studied.

The analysis covers a targeted literature search rather than an exhaustive survey of all optimization methods. The taxonomy structure shows the field has many alternative acceleration strategies (preconditioning, finite-memory momentum, learning rate schedules), but the infinite-memory stationary perspective with complex-plane contours represents a distinct angle. The limited candidate pool and sparse taxonomy leaf suggest this direction is relatively underexplored, though the search scope does not rule out related work in adjacent mathematical optimization communities not captured by the semantic search.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Accelerating stochastic gradient descent on ill-conditioned quadratic problems. The field addresses the challenge of slow convergence when the objective's Hessian has widely varying eigenvalues, a common bottleneck in optimization. The taxonomy reveals several complementary strategies: Preconditioning and Adaptive Scaling Methods reshape the geometry by transforming the problem space, often using second-order information or sketching techniques like those in Preconditioned SGD[1] and Sketched Conditioning[26]. Momentum and Acceleration Techniques exploit temporal structure in gradient estimates to dampen oscillations and speed convergence, with works ranging from classical heavy-ball variants to more recent noise-adaptive schemes such as Noise-adaptive Heavy-Ball[30]. Learning Rate Scheduling and Restart Strategies, exemplified by Warm Restarts[2], dynamically adjust step sizes or periodically reset momentum to escape plateaus. Distributed and Federated Optimization tackles ill-conditioning in decentralized settings, as seen in Local SGD[6] and Local SGD Quadratic[28]. Problem-Specific Acceleration tailors methods to particular structures like matrix completion or fine-tuning tasks, while Theoretical Analysis and Convergence Guarantees provide rigorous understanding of convergence rates under various noise and conditioning assumptions, and Empirical Studies validate these approaches on real applications. A particularly active line of research explores how to incorporate long-range gradient history without prohibitive memory costs, balancing the benefits of infinite-memory schemes against computational overhead. Corner Gradient Descent[0] sits within the Momentum and Acceleration Techniques branch, specifically under Generalized Stationary SGD with Infinite Memory, emphasizing methods that maintain weighted averages of past gradients to implicitly precondition updates. This contrasts with explicit preconditioning approaches like Kalman SGD[3], which uses Kalman filtering to estimate curvature, and differs from finite-memory momentum methods that discard older information. The interplay between memory depth, noise robustness, and per-iteration cost remains a central trade-off: while infinite-memory strategies can better capture global curvature, they must carefully manage numerical stability and computational complexity, especially when condition numbers vary over time as studied in Varying Condition Numbers[8].

Claimed Contributions

Contour view of generalized (S)GD algorithms

10 retrieved papers

The authors establish that stationary SGD algorithms with arbitrary linear memory can be represented as contours in the complex plane through a response map Ψ. This geometric viewpoint allows algorithm design to be reformulated as contour design, where the loss evolution is determined by the contour properties.

10 retrieved papers

Corner algorithms with acceleration factor θ

3 retrieved papers

The authors introduce corner algorithms based on contours with corners having external angle θπ (1 < θ < 2). They prove these algorithms accelerate convergence from O(t−ζ) to O(t−θζ) and establish a complete phase diagram showing the maximum achievable acceleration θmax = min(2, ν, 2/(ζ+1/ν)) in the signal-dominated regime.

3 retrieved papers

Finite-memory approximation of corner algorithms

10 retrieved papers

The authors show that ideal corner algorithms, which require infinite memory, can be efficiently approximated by practical finite-memory algorithms. This is achieved using fast rational approximations of power functions with error bounds of O(e−c√M), making the theoretical corner algorithms implementable in practice.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Contour view of generalized (S)GD algorithms

[37] Adaptive orthogonal gradient descent algorithm for fully complex-valued neural networks PDF

Cannot Refute

[38] Multi-depth computer-generated hologram based on stochastic gradient descent algorithm with weighted complex loss function and masked diffraction PDF

Cannot Refute

[39] Accelerating Smooth Games by Manipulating Spectral Shapes PDF

Cannot Refute

[40] The saddle-point method in differential privacy PDF

Cannot Refute

[41] Load Distribution Optimization of Steel Storage Rack Based on Genetic Algorithm PDF

Cannot Refute

[42] Complex s-plane modeling and 2d characterization of the stochastic scattering loss in symmetric dielectric slab waveguides exhibiting ergodic surface-roughness with â¦ PDF

Cannot Refute

[43] Active contour model in deep learning era: A revise and review PDF

Cannot Refute

[44] An extension of the steepest descent method for contour integrals PDF

Cannot Refute

[45] Natural Gradient Descent of Complex-Valued Neural Networks Invariant under Rotations PDF

Cannot Refute

[46] " Brownian strings": Segmenting images with stochastically deformable contours PDF

Cannot Refute

Contribution

Corner algorithms with acceleration factor θ

[34] Power Gradient Descent PDF

Cannot Refute

[36] ORAL PRESENTATIONS ABSTRACTS PDF

Cannot Refute

Contribution

Finite-memory approximation of corner algorithms

[47] A two-phase infinite/finite low-level memory model: Reconciling integerâpointer casts, finite space, and undef at the llvm ir level of abstraction PDF

Cannot Refute

[48] Decentralized Stochastic Control with Finite-Dimensional Memories: A Memory Limitation Approach PDF

Cannot Refute

[49] Capacity Region of Asynchronous Multiple Access Channels With FTN PDF

Cannot Refute

[50] Finite-memory near-optimal learning for Markov decision processes with long-run average reward PDF

Cannot Refute

[51] Games Where You Can Play Optimally with Arena-Independent Finite Memory PDF

Cannot Refute

[52] Finite rationality and interpersonal complexity in repeated games PDF

Cannot Refute

[53] Artificial neural network approximation of fractional-order derivative operators: analysis and DSP implementation PDF

Cannot Refute

[54] Minimizing Running Costs in Consumption Systems PDF

Cannot Refute

[55] Strategy Synthesis for Multi-dimensional Quantitative Objectives PDF

Cannot Refute

[56] Different strokes in randomised strategies: Revisiting Kuhn's theorem under finite-memory assumptions PDF

Cannot Refute

Corner Gradient Descent

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

Contribution Analysis

Contour view of generalized (S)GD algorithms

[37] Adaptive orthogonal gradient descent algorithm for fully complex-valued neural networks PDF

[38] Multi-depth computer-generated hologram based on stochastic gradient descent algorithm with weighted complex loss function and masked diffraction PDF

[39] Accelerating Smooth Games by Manipulating Spectral Shapes PDF

[40] The saddle-point method in differential privacy PDF

[41] Load Distribution Optimization of Steel Storage Rack Based on Genetic Algorithm PDF

[42] Complex s-plane modeling and 2d characterization of the stochastic scattering loss in symmetric dielectric slab waveguides exhibiting ergodic surface-roughness with â¦ PDF

[43] Active contour model in deep learning era: A revise and review PDF

[44] An extension of the steepest descent method for contour integrals PDF

[45] Natural Gradient Descent of Complex-Valued Neural Networks Invariant under Rotations PDF

[46] " Brownian strings": Segmenting images with stochastically deformable contours PDF

Corner algorithms with acceleration factor θ

[34] Power Gradient Descent PDF

[35] LÃ©vy gradient descent: Augmented random search for geophysical inverse problems PDF

[36] ORAL PRESENTATIONS ABSTRACTS PDF

Finite-memory approximation of corner algorithms

[47] A two-phase infinite/finite low-level memory model: Reconciling integerâpointer casts, finite space, and undef at the llvm ir level of abstraction PDF

[48] Decentralized Stochastic Control with Finite-Dimensional Memories: A Memory Limitation Approach PDF

[49] Capacity Region of Asynchronous Multiple Access Channels With FTN PDF

[50] Finite-memory near-optimal learning for Markov decision processes with long-run average reward PDF

[51] Games Where You Can Play Optimally with Arena-Independent Finite Memory PDF

[52] Finite rationality and interpersonal complexity in repeated games PDF

[53] Artificial neural network approximation of fractional-order derivative operators: analysis and DSP implementation PDF

[54] Minimizing Running Costs in Consumption Systems PDF

[55] Strategy Synthesis for Multi-dimensional Quantitative Objectives PDF

[56] Different strokes in randomised strategies: Revisiting Kuhn's theorem under finite-memory assumptions PDF

Table of Contents

Corner Gradient Descent

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

Contribution Analysis

Contour view of generalized (S)GD algorithms

[37] Adaptive orthogonal gradient descent algorithm for fully complex-valued neural networks PDF

[38] Multi-depth computer-generated hologram based on stochastic gradient descent algorithm with weighted complex loss function and masked diffraction PDF

[39] Accelerating Smooth Games by Manipulating Spectral Shapes PDF

[40] The saddle-point method in differential privacy PDF

[41] Load Distribution Optimization of Steel Storage Rack Based on Genetic Algorithm PDF

[42] Complex s-plane modeling and 2d characterization of the stochastic scattering loss in symmetric dielectric slab waveguides exhibiting ergodic surface-roughness with â¦ PDF

[43] Active contour model in deep learning era: A revise and review PDF

[44] An extension of the steepest descent method for contour integrals PDF

[45] Natural Gradient Descent of Complex-Valued Neural Networks Invariant under Rotations PDF

[46] " Brownian strings": Segmenting images with stochastically deformable contours PDF

Corner algorithms with acceleration factor θ

[34] Power Gradient Descent PDF

[35] LÃ©vy gradient descent: Augmented random search for geophysical inverse problems PDF

[36] ORAL PRESENTATIONS ABSTRACTS PDF

Finite-memory approximation of corner algorithms

[47] A two-phase infinite/finite low-level memory model: Reconciling integerâpointer casts, finite space, and undef at the llvm ir level of abstraction PDF

[48] Decentralized Stochastic Control with Finite-Dimensional Memories: A Memory Limitation Approach PDF

[49] Capacity Region of Asynchronous Multiple Access Channels With FTN PDF

[50] Finite-memory near-optimal learning for Markov decision processes with long-run average reward PDF

[51] Games Where You Can Play Optimally with Arena-Independent Finite Memory PDF

[52] Finite rationality and interpersonal complexity in repeated games PDF

[53] Artificial neural network approximation of fractional-order derivative operators: analysis and DSP implementation PDF

[54] Minimizing Running Costs in Consumption Systems PDF

[55] Strategy Synthesis for Multi-dimensional Quantitative Objectives PDF

[56] Different strokes in randomised strategies: Revisiting Kuhn's theorem under finite-memory assumptions PDF

Table of Contents

[42] Complex s-plane modeling and 2d characterization of the stochastic scattering loss in symmetric dielectric slab waveguides exhibiting ergodic surface-roughness with â¦ PDF

[47] A two-phase infinite/finite low-level memory model: Reconciling integerâpointer casts, finite space, and undef at the llvm ir level of abstraction PDF