Derandomized Online-to-Non-convex Conversion for Stochastic Weakly Convex Optimization

ICLR 2026 Conference SubmissionAnonymous Authors
Non-smooth optimizationNon-convex optimizationStochastic gradient descent with momentumonline learningneural networks
Abstract:

Online-to-non-convex conversion (O2NC) is an online updates learning framework for producing Goldstein (δ,ϵ)(\delta,\epsilon)-stationary points of non-smooth non-convex functions with optimal oracle complexity O(δ1ϵ3)\mathcal{O}(\delta^{-1} \epsilon^{-3}). Subject to auxiliary \emph{random interpolation or scaling}, O2NC recapitulates the stochastic gradient descent with momentum (SGDM) algorithm popularly used for training neural networks. Randomization, however, introduces deviations from practical SGDM. So a natural question arises: Can we derandomize O2NC to achieve the same optimal guarantees while resembling SGDM? On the negative side, the general answer is \emph{no} due to the impossibility results of~\citet{jordan23deterministic}, showing that no dimension-free rate can be achieved by deterministic algorithms. On the positive side, as the primary contribution of the present work, we show that O2NC can be naturally derandomized for \emph{weakly convex} functions. Remarkably, our deterministic algorithm converges at an optimal rate as long as the weak convexity parameter is no larger than O(δ1ϵ1)\mathcal{O}(\delta^{-1}\epsilon^{-1}). In other words, the stronger stationarity is expected, the higher non-convexity can be tolerated by our optimizer. Additionally, we develop a periodically restarted variant of our method to allow for more progressive update when the iterates are far from stationary. The resulting algorithm, which corresponds to a momentum-restarted version of SGDM, has been empirically shown to be effective and efficient for training ResNet and ViT networks.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes a deterministic variant of online-to-non-convex conversion specifically for weakly convex functions, achieving optimal oracle complexity without random interpolation or scaling. It occupies the sole leaf within the Deterministic Online-to-Non-convex Conversion Methods branch, with no sibling papers in the taxonomy. This positioning suggests the paper addresses a relatively sparse research direction, as the taxonomy contains only three total papers across three distinct branches, indicating limited prior exploration of deterministic conversion frameworks in this setting.

The taxonomy reveals two neighboring branches: Randomized Online-to-Non-convex Conversion Methods and Robust Conversion Under Heavy-Tailed Noise. The randomized branch handles general non-smooth non-convex objectives using stochastic techniques, while the robust branch addresses heavy-tailed gradient noise through clipping mechanisms. The paper diverges from these directions by eliminating randomness entirely and restricting attention to weakly convex functions, trading generality for deterministic guarantees. The scope notes clarify that weak convexity assumptions distinguish this work from general non-smooth methods, while the exclusion of heavy-tailed settings separates it from robust variants.

Among fourteen candidates examined, the core derandomization contribution shows no clear refutation across four candidates reviewed. However, the periodically restarted OGD variant with momentum-restarted SGDM reveals one refutable candidate among six examined, and the regularized Goldstein stationarity criterion shows one refutable candidate among four examined. These statistics suggest the algorithmic framework appears relatively novel within the limited search scope, though specific technical components encounter more substantial prior work. The modest candidate pool indicates the analysis captures nearby semantic matches rather than exhaustive field coverage.

Based on the limited literature search of fourteen candidates, the derandomization framework for weakly convex functions appears to occupy underexplored territory, though certain algorithmic and stationarity components show overlap with existing work. The sparse taxonomy structure and absence of sibling papers reinforce this impression, though the small search scope precludes definitive claims about field-wide novelty. The analysis primarily reflects proximity to top semantic matches rather than comprehensive prior art coverage.

Taxonomy

Core-task Taxonomy Papers
2
3
Claimed Contributions
14
Contribution Candidate Papers Compared
2
Refutable Paper

Research Landscape Overview

Core task: derandomized online-to-non-convex conversion for stochastic weakly convex optimization. The field centers on techniques that transform online learning algorithms into methods for stochastic non-convex optimization, particularly for weakly convex objectives. The taxonomy reveals three main branches. Deterministic Online-to-Non-convex Conversion Methods focus on derandomization strategies that eliminate or reduce the randomness inherent in classical conversion schemes, aiming for more predictable and reproducible convergence guarantees. Randomized Online-to-Non-convex Conversion Methods retain stochastic elements in the conversion process, often trading simplicity or computational efficiency for probabilistic guarantees. Robust Online-to-Non-convex Conversion Under Heavy-Tailed Noise addresses settings where gradient noise exhibits heavy tails, requiring specialized techniques to maintain convergence without strong moment assumptions. These branches collectively address how to leverage the rich toolkit of online learning in the challenging landscape of non-convex stochastic optimization. A central theme across these branches is the trade-off between determinism and computational overhead: derandomized approaches promise stability but may require more sophisticated algorithmic machinery, while randomized methods remain simpler at the cost of probabilistic guarantees. Heavy-tailed noise settings introduce additional complexity, as standard averaging or clipping strategies may fail without careful design. Derandomized Online Conversion[0] sits squarely within the deterministic branch, emphasizing optimal derandomized conversion for weakly convex functions. Its focus contrasts with earlier work such as Online Nonconvex Conversion[2], which laid foundational randomized conversion ideas, and with Heavy Tails Nonconvex[1], which tackles robustness under distributional challenges. By pursuing deterministic guarantees, Derandomized Online Conversion[0] addresses reproducibility and worst-case performance, distinguishing itself from probabilistic counterparts while remaining complementary to robust methods designed for heavy-tailed scenarios.

Claimed Contributions

Derandomized online-to-non-convex conversion for weakly convex functions

The authors develop a deterministic variant of the online-to-non-convex conversion (O2NC) framework specifically for weakly convex optimization. This derandomized algorithm eliminates the random interpolation or scaling steps required in prior O2NC methods while maintaining optimal oracle complexity and resembling standard SGDM more closely.

4 retrieved papers
Periodically restarted OGD variant with momentum-restarted SGDM

The authors introduce a novel periodically restarted online gradient descent procedure that resets momentum updates after fixed iteration periods. This variant corresponds to a momentum-restarted version of SGDM and enables more aggressive updates without explicit ball constraints, demonstrated to be effective for training ResNet and ViT networks.

6 retrieved papers
Can Refute
Regularized Goldstein stationarity criterion

The authors define a new regularized version of Goldstein stationarity that simultaneously controls the scale of convex hulls of subgradients and proximity of underlying subsets. This criterion obviates the need for explicit tiny-ball constraints in the online learning module, allowing for potentially more progressive increment updates.

4 retrieved papers
Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Derandomized online-to-non-convex conversion for weakly convex functions

The authors develop a deterministic variant of the online-to-non-convex conversion (O2NC) framework specifically for weakly convex optimization. This derandomized algorithm eliminates the random interpolation or scaling steps required in prior O2NC methods while maintaining optimal oracle complexity and resembling standard SGDM more closely.

Contribution

Periodically restarted OGD variant with momentum-restarted SGDM

The authors introduce a novel periodically restarted online gradient descent procedure that resets momentum updates after fixed iteration periods. This variant corresponds to a momentum-restarted version of SGDM and enables more aggressive updates without explicit ball constraints, demonstrated to be effective for training ResNet and ViT networks.

Contribution

Regularized Goldstein stationarity criterion

The authors define a new regularized version of Goldstein stationarity that simultaneously controls the scale of convex hulls of subgradients and proximity of underlying subsets. This criterion obviates the need for explicit tiny-ball constraints in the online learning module, allowing for potentially more progressive increment updates.