Improving Online-to-Nonconvex Conversion for Smooth Optimization via Double Optimism

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 7.0 Download Report PDF

Online-to-nonconvex conversionOptimistic Gradient DescentSmooth Nonconvex Optimization

A recent breakthrough in nonconvex optimization is the online-to-nonconvex conversion framework of Cutkosky et al. (2023), which reformulates the task of finding an $\varepsilon$ -first-order stationary point as an online learning problem. When both the gradient and the Hessian are Lipschitz continuous, instantiating this framework with two different online learners achieves a complexity of $\mathcal{O}(\varepsilon^{-1.75}\log(1/\varepsilon))$ in the deterministic case and a complexity of $\mathcal{O}(\varepsilon^{-3.5})$ in the stochastic case. However, this approach suffers from several limitations: (i) the deterministic method relies on a complex double-loop scheme that solves a fixed-point equation to construct hint vectors for an optimistic online learner, introducing an extra logarithmic factor; (ii) the stochastic method assumes a bounded second-order moment of the stochastic gradient, which is stronger than standard variance bounds; and (iii) different online learning algorithms are used in the two settings. In this paper, we address these issues by introducing an online optimistic gradient method based on a novel doubly optimistic hint function. Specifically, we use the gradient at an extrapolated point as the hint, motivated by two optimistic assumptions: that the difference between the hint and the target gradient remains near constant, and that consecutive update directions change slowly due to smoothness. Our method eliminates the need for a double loop and removes the logarithmic factor. Furthermore, by simply replacing full gradients with stochastic gradients and under the standard assumption that their variance is bounded by $\sigma^2$ , we obtain a unified algorithm with complexity $\mathcal{O}(\varepsilon^{-1.75} + \sigma^2 \varepsilon^{-3.5})$ , smoothly interpolating between the best-known deterministic rate and the optimal stochastic rate.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces a doubly optimistic hint function within the online-to-nonconvex conversion framework, aiming to find first-order stationary points in smooth nonconvex optimization. It sits in the 'Online-to-Nonconvex Conversion Frameworks' leaf, which contains only two papers including the original work. This is a notably sparse research direction within a taxonomy of 50 papers across 26 leaf nodes, suggesting that the conversion-based approach remains relatively underexplored compared to more established branches like adaptive gradient methods or momentum-based schemes.

The taxonomy reveals that neighboring leaves include 'Adaptive Gradient Methods' (four papers on Adam-type algorithms), 'Fixed-Stepsize and Momentum Methods' (two papers on classical acceleration), and 'Variance-Reduced and Retrospective Methods' (two papers on stochastic gradient refinement). The online-to-nonconvex conversion approach diverges from these by reformulating optimization as an online learning problem with regret-based analysis, rather than directly designing step-size rules or momentum schedules. This structural separation indicates that the paper's core methodology—leveraging optimism from online learning—occupies a distinct conceptual niche within the broader algorithm design landscape.

Among 20 candidates examined through semantic search and citation expansion, no papers were found that clearly refute the three main contributions. The unified algorithm achieving best-known rates in both deterministic and stochastic settings was assessed against 10 candidates, none of which provided overlapping prior work. Similarly, the adaptive step-size scheme for parameter-free optimization was compared to 10 candidates without finding refutable overlap. The doubly optimistic hint function itself was not matched against any candidates in the search. These statistics reflect a limited search scope rather than exhaustive coverage, but suggest that within the examined literature, the specific combination of techniques appears novel.

Given the sparse population of the online-to-nonconvex conversion leaf and the absence of refutable prior work among 20 examined candidates, the paper's contributions appear to advance a relatively young research direction. However, the limited search scale means that closely related work in adjacent areas—such as optimistic online learning or parameter-free adaptive methods—may not have been fully captured. The analysis provides evidence of novelty within the examined scope but does not rule out relevant prior work beyond the top-20 semantic matches.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: finding first-order stationary points in smooth nonconvex optimization. The field has matured into several well-defined branches that address complementary aspects of this central challenge. Complexity Analysis and Lower Bounds (e.g., Nonconvex Lower Bounds[1], Oracle Complexity[34]) establishes fundamental limits on what any algorithm can achieve, while Algorithm Design and Convergence explores practical schemes—ranging from adaptive methods like Adaptive Nonconvex[3] and UAdam[18] to momentum-based approaches such as Learning Rate Free Momentum[4] and online-to-nonconvex conversion frameworks like Online to Nonconvex[31]. Constrained settings are handled by a dedicated branch (Constrained Nonconvex Optimization) that includes works on functional constraints (Functional Constrained[5]) and affine constraints (Affinely Constrained Bounds[13]), while Nonsmooth and Composite Optimization tackles problems where smoothness assumptions break down (Gradient Free Nonsmooth[7], Proximal Nonsmooth[44]). Higher-Order and Second-Order Methods pursue faster convergence by exploiting curvature information (Fast Newton Variant[16], Second Order Stationary[30]), and Specialized Problem Structures investigates domains like manifolds (Manifold Acceleration[15]) and bilevel optimization (Bilevel Stationary Points[35]). Recent activity has concentrated on bridging online learning techniques with nonconvex settings and on refining complexity guarantees under weaker assumptions. Double Optimism[0] sits within the online-to-nonconvex conversion framework, closely aligned with Online to Nonconvex[31], which systematically translates regret bounds from online convex optimization into nonconvex convergence rates. This line of work contrasts with more classical adaptive gradient methods (Adaptive Nonconvex[3], Adaptive Gradient Convergence[8]) that tune step sizes based on observed gradient norms, and with momentum schemes (Learning Rate Free Momentum[4]) that accelerate convergence through inertial dynamics. By leveraging optimism—a technique originally designed for online regret minimization—Double Optimism[0] offers a distinct algorithmic pathway that exploits predictive gradient information, positioning it as a natural extension of conversion-based approaches while differing in emphasis from direct adaptive or momentum-driven designs.

Claimed Contributions

Doubly optimistic hint function for online-to-nonconvex conversion

0 retrieved papers

The authors propose a new hint function that evaluates the gradient at an extrapolated point, motivated by two optimistic assumptions: that the difference between hint and target gradient remains near constant, and that consecutive update directions change slowly due to smoothness. This eliminates the need for a double-loop fixed-point iteration used in prior work.

0 retrieved papers

Unified algorithm achieving best-known rates in deterministic and stochastic settings

10 retrieved papers

The authors develop a single algorithm that achieves complexity O(ε−1.75) in the deterministic setting (matching the best-known rate without logarithmic factors) and O(σ²ε−3.5) in the stochastic setting (matching the optimal rate), smoothly interpolating between both regimes using only standard variance assumptions.

10 retrieved papers

Adaptive step size scheme for parameter-free optimization

10 retrieved papers

The authors introduce an adaptive step size scheme that automatically adjusts based on past gradient information, eliminating the need for manual tuning while maintaining the same complexity guarantees and offering improved dependence on local rather than global Lipschitz constants.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[31] Optimal stochastic non-smooth non-convex optimization through online-to-non-convex conversion PDF

Cutkosky, Ashok, Mehta, Harsh, Orabona, Francesco (2023)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Doubly optimistic hint function for online-to-nonconvex conversion

Contribution

Unified algorithm achieving best-known rates in deterministic and stochastic settings

[25] Variance-reduced first-order methods for deterministically constrained stochastic nonconvex optimization with strong convergence guarantees PDF

Cannot Refute

[51] High-probability complexity bounds for stochastic non-convex minimax optimization PDF

Cannot Refute

[52] Data-Driven Compositional Optimization in Misspecified Regimes PDF

Cannot Refute

[53] Trust region methods for nonconvex stochastic optimization beyond lipschitz smoothness PDF

Cannot Refute

[54] A unified and refined convergence analysis for non-convex decentralized learning PDF

Cannot Refute

[55] Online Optimization Perspective on First-Order and Zero-Order Decentralized Nonsmooth Nonconvex Stochastic Optimization PDF

Cannot Refute

[56] A Novel Unified Parametric Assumption for Nonconvex Optimization PDF

Cannot Refute

[57] A guide through the zoo of biased SGD PDF

Cannot Refute

[58] Optimal complexity in non-convex decentralized learning over time-varying networks PDF

Cannot Refute

[59] Cyclic Block Coordinate Descent With Variance Reduction for Composite Nonconvex Optimization PDF

Cannot Refute

Contribution

Adaptive step size scheme for parameter-free optimization

[3] Adaptive methods for nonconvex optimization PDF

Cannot Refute

[38] Simple linesearch-free first-order methods for nonconvex optimization PDF

Cannot Refute

[60] Nest your adaptive algorithm for parameter-agnostic nonconvex minimax optimization PDF

Cannot Refute

[61] Storm+: Fully adaptive sgd with recursive momentum for nonconvex optimization PDF

Cannot Refute

[62] Analysis of Schedule-Free Nonconvex Optimization PDF

Cannot Refute

[63] Towards Stability of Parameter-free Optimization PDF

Cannot Refute

[64] STORM+: Fully adaptive SGD with momentum for nonconvex optimization PDF

Cannot Refute

[65] Problem-Parameter-Free Federated Learning PDF

Cannot Refute

[66] Agda+: Proximal alternating gradient descent ascent method with a nonmonotone adaptive step-size search for nonconvex minimax problems PDF

Cannot Refute

[67] Comparison of optimization techniques based on gradient descent algorithm: A review PDF

Cannot Refute

Improving Online-to-Nonconvex Conversion for Smooth Optimization via Double Optimism

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[31] Optimal stochastic non-smooth non-convex optimization through online-to-non-convex conversion PDF

Contribution Analysis

Doubly optimistic hint function for online-to-nonconvex conversion

Unified algorithm achieving best-known rates in deterministic and stochastic settings

[25] Variance-reduced first-order methods for deterministically constrained stochastic nonconvex optimization with strong convergence guarantees PDF

[51] High-probability complexity bounds for stochastic non-convex minimax optimization PDF

[52] Data-Driven Compositional Optimization in Misspecified Regimes PDF

[53] Trust region methods for nonconvex stochastic optimization beyond lipschitz smoothness PDF

[54] A unified and refined convergence analysis for non-convex decentralized learning PDF

[55] Online Optimization Perspective on First-Order and Zero-Order Decentralized Nonsmooth Nonconvex Stochastic Optimization PDF

[56] A Novel Unified Parametric Assumption for Nonconvex Optimization PDF

[57] A guide through the zoo of biased SGD PDF

[58] Optimal complexity in non-convex decentralized learning over time-varying networks PDF

[59] Cyclic Block Coordinate Descent With Variance Reduction for Composite Nonconvex Optimization PDF

Adaptive step size scheme for parameter-free optimization

[3] Adaptive methods for nonconvex optimization PDF

[38] Simple linesearch-free first-order methods for nonconvex optimization PDF

[60] Nest your adaptive algorithm for parameter-agnostic nonconvex minimax optimization PDF

[61] Storm+: Fully adaptive sgd with recursive momentum for nonconvex optimization PDF

[62] Analysis of Schedule-Free Nonconvex Optimization PDF

[63] Towards Stability of Parameter-free Optimization PDF

[64] STORM+: Fully adaptive SGD with momentum for nonconvex optimization PDF

[65] Problem-Parameter-Free Federated Learning PDF

[66] Agda+: Proximal alternating gradient descent ascent method with a nonmonotone adaptive step-size search for nonconvex minimax problems PDF

[67] Comparison of optimization techniques based on gradient descent algorithm: A review PDF

Table of Contents