Nesterov Finds GRAAL: Optimal and Adaptive Gradient Method for Convex Optimization

ICLR 2026 Conference SubmissionAnonymous Authors
convex optimizationadaptive optimizationgradient methodsaccelerated methods
Abstract:

In this paper, we focus on the problem of minimizing a continuously differentiable convex objective function, minxf(x)\min_x f(x). Recently, Malitsky (2020); Alacaoglu et al. (2023) developed an adaptive first-order method, GRAAL. This algorithm computes stepsizes by estimating the local curvature of the objective function without any line search procedures or hyperparameter tuning, and attains the standard iteration complexity O(Lx0x2/ϵ)\mathcal{O}(L\Vert x_0-x^* \Vert^2/\epsilon) of fixed-stepsize gradient descent for LL-smooth functions. However, a natural question arises: is it possible to accelerate the convergence of GRAAL to match the optimal complexity O(Lx0x2/ϵ)\mathcal{O}(\sqrt{L\Vert x_0-x^*\Vert^2/\epsilon}) of the accelerated gradient descent of Nesterov (1983)? Although some attempts have been made by Li and Lan (2025); Suh and Ma (2025), the ability of existing accelerated algorithms to adapt to the local curvature of the objective function is highly limited. We resolve this issue and develop GRAAL with Nesterov acceleration, which can adapt its stepsize to the local curvature at a geometric, or linear, rate just like non-accelerated GRAAL. We demonstrate the adaptive capabilities of our algorithm by proving that it achieves near-optimal iteration complexities for LL-smooth functions, as well as under a more general (L0,L1)(L_0,L_1)-smoothness assumption (Zhang et al., 2019).

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper develops an accelerated variant of the GRAAL algorithm that combines Nesterov momentum with adaptive stepsize selection based on local curvature estimation. It resides in the 'Local Curvature Estimation Without Line Search' leaf, which contains seven papers total, indicating a moderately populated research direction. This leaf focuses specifically on methods that infer smoothness or Lipschitz constants dynamically without backtracking procedures, distinguishing it from line-search-based approaches in a sibling leaf. The paper's core contribution—achieving accelerated convergence while maintaining GRAAL's adaptive stepsize capabilities—addresses a natural extension question in this subfield.

The taxonomy reveals that this work sits at the intersection of two major branches: 'Adaptive Stepsize Selection Mechanisms' and 'Acceleration Frameworks and Momentum Techniques'. The sibling leaf 'Accelerated Gradient Methods with Adaptive Stepsizes' contains three papers exploring similar acceleration themes but through different mechanisms. Neighboring leaves address related but distinct approaches: 'Polyak and Barzilai-Borwein Stepsize Strategies' uses gradient-difference-based rules rather than curvature estimates, while 'Universal and Parameter-Free Gradient Methods' emphasizes broader adaptivity to noise and smoothness. The paper's positioning suggests it bridges local curvature adaptation with optimal-rate acceleration, a combination less explored in adjacent branches.

Among thirty candidates examined, the contribution on near-optimal complexity for L-smooth functions shows three refutable candidates, indicating substantial prior work in this area. The core algorithmic contribution (accelerated GRAAL with adaptive stepsize) examined ten candidates with zero refutations, suggesting greater novelty in the specific combination of techniques. The contribution addressing (L0,L1)-smoothness also examined ten candidates without refutation. The limited search scope means these statistics reflect top-semantic-match overlap rather than exhaustive field coverage. The algorithmic novelty appears stronger than the complexity-bound claims, where existing accelerated adaptive methods provide closer precedents.

Based on the thirty-candidate search, the work appears to occupy a meaningful but not entirely unexplored niche. The taxonomy structure shows this is an active area with multiple related approaches, and the refutation statistics suggest the acceleration-with-adaptation combination is less saturated than the complexity guarantees themselves. The analysis does not cover the full breadth of optimization literature, so additional related work may exist beyond the top-semantic matches examined. The paper's positioning within a seven-paper leaf suggests moderate but not extreme crowding in this specific methodological direction.

Taxonomy

Core-task Taxonomy Papers
37
3
Claimed Contributions
30
Contribution Candidate Papers Compared
3
Refutable Paper

Research Landscape Overview

Core task: Accelerated adaptive gradient method for convex optimization with local curvature estimation. The field of accelerated adaptive gradient methods for convex optimization has evolved into a rich landscape organized around several complementary themes. Adaptive Stepsize Selection Mechanisms explore how to tune learning rates without exhaustive line search, often relying on local curvature estimates or gradient-based heuristics to balance computational cost and convergence speed. Acceleration Frameworks and Momentum Techniques build on Nesterov-style momentum and related schemes to achieve optimal rates, while Specialized Problem Settings and Constraints address structured domains such as proximal operators, feasible sets, or non-Euclidean geometries. Stochastic and Online Optimization extends these ideas to noisy or streaming data, and Specialized Application Domains tailor methods to machine learning, differential privacy, or large-scale scenarios. Representative works like Linesearch-free Bregman Proximal[2] and Uniformly Optimal Without Linesearch[5] illustrate the drive to eliminate costly subroutines, while Accelerated Quasi-Newton Extragradient[3] and High-order Accumulative Regularization[4] show how second-order information can be incorporated efficiently. A particularly active line of research focuses on local curvature estimation without line search, where methods infer smoothness or Lipschitz constants on the fly to adapt stepsizes dynamically. This branch contrasts with classical backtracking approaches by avoiding repeated function evaluations, trading off some theoretical guarantees for practical efficiency. Nesterov GRAAL[0] sits squarely within this cluster, emphasizing acceleration combined with curvature-aware stepsize rules that do not require explicit line search. It shares conceptual ground with Local Curvature Descent[13], which also leverages gradient-based curvature proxies, and with Adaptive Proximal Local Lipschitz[24], which adapts to local geometry in proximal settings. Compared to Adaptive Without Descent[28], which relaxes monotone descent assumptions, Nesterov GRAAL[0] retains acceleration guarantees while still avoiding line search overhead. This positioning highlights an ongoing tension in the field: balancing the simplicity and speed of parameter-free methods against the convergence assurances of more conservative, line-search-based schemes.

Claimed Contributions

Accelerated GRAAL algorithm with adaptive stepsize and Nesterov acceleration

The authors propose Accelerated GRAAL (Algorithm 1), a first-order optimization method that incorporates Nesterov acceleration while maintaining the ability to adapt stepsizes to local curvature geometrically. This resolves limitations of prior accelerated adaptive methods like AC-FGM and AdaNAG, which only allow sublinear stepsize growth.

10 retrieved papers
Near-optimal iteration complexity for L-smooth convex functions without hyperparameter tuning

The authors prove that Algorithm 1 achieves the optimal iteration complexity O(sqrt(L)||x0 - x*||^2 / epsilon) for L-smooth functions up to additive logarithmic factors, without requiring hyperparameter tuning or line search procedures, as stated in Corollary 2.

10 retrieved papers
Can Refute
Near-optimal iteration complexity under (L0, L1)-smoothness assumption

The authors demonstrate that Algorithm 1 achieves iteration complexity matching the optimal rate for (L0, L1)-smooth functions up to additive constant factors that do not depend on precision epsilon. This is the first adaptive algorithm to achieve such results under this more general smoothness condition.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Accelerated GRAAL algorithm with adaptive stepsize and Nesterov acceleration

The authors propose Accelerated GRAAL (Algorithm 1), a first-order optimization method that incorporates Nesterov acceleration while maintaining the ability to adapt stepsizes to local curvature geometrically. This resolves limitations of prior accelerated adaptive methods like AC-FGM and AdaNAG, which only allow sublinear stepsize growth.

Contribution

Near-optimal iteration complexity for L-smooth convex functions without hyperparameter tuning

The authors prove that Algorithm 1 achieves the optimal iteration complexity O(sqrt(L)||x0 - x*||^2 / epsilon) for L-smooth functions up to additive logarithmic factors, without requiring hyperparameter tuning or line search procedures, as stated in Corollary 2.

Contribution

Near-optimal iteration complexity under (L0, L1)-smoothness assumption

The authors demonstrate that Algorithm 1 achieves iteration complexity matching the optimal rate for (L0, L1)-smooth functions up to additive constant factors that do not depend on precision epsilon. This is the first adaptive algorithm to achieve such results under this more general smoothness condition.