Better LMO-based Momentum Methods with Second-Order Information
Overview
Overall Novelty Assessment
The paper proposes integrating Hessian-Corrected Momentum into the Linear Minimization Oracle framework, targeting improved convergence rates under relaxed smoothness and arbitrary norms. It resides in the 'LMO-Based and Second-Order Momentum Methods' leaf, which currently contains only this paper as a sibling. This leaf sits within the broader 'Specialized Momentum Techniques and Extensions' branch, indicating a relatively sparse research direction compared to more populated areas like 'Adam and AMSGrad Variants' or 'Momentum Methods for Non-Smooth Non-Convex Problems', which contain multiple sibling papers.
The taxonomy reveals that neighboring leaves address related but distinct momentum formulations: 'Heavy-Ball and Nesterov Momentum Variants' explores classical momentum schemes, 'Model-Based and Proximal Momentum Methods' focuses on proximal operators for weakly convex problems, and 'Shuffling and Finite-Sum Optimization' targets finite-sum settings. The paper's LMO-based approach diverges from these gradient-centric methods by substituting gradient steps with structured subproblem solves, positioning it at the intersection of second-order information and oracle-based optimization—a boundary explicitly noted in the taxonomy's exclude criteria for first-order methods.
Among thirty candidates examined, the contribution on improved convergence rates shows one refutable candidate, while the LMO-based framework and empirical validation contributions each examined ten candidates with no clear refutations. The convergence rate claim appears to have more substantial prior work overlap within the limited search scope, whereas the LMO-arbitrary-norm integration and empirical validation on MLPs and LSTMs show fewer direct precedents among the examined papers. This suggests the algorithmic framework may occupy a less explored niche, though the convergence rate improvement builds on more established theoretical territory.
Based on the top-thirty semantic matches and taxonomy structure, the work appears to address a relatively underexplored intersection of LMO methods, second-order momentum, and arbitrary norms. The limited sibling count in its taxonomy leaf and the sparse refutation rate for two of three contributions suggest potential novelty, though the analysis does not cover exhaustive literature beyond the examined candidates. The convergence rate contribution warrants closer scrutiny given the identified overlap, while the framework integration and empirical scope appear less directly anticipated by prior work within the search scope.
Taxonomy
Research Landscape Overview
Claimed Contributions
The authors extend LMO-based optimization algorithms by integrating two variants of second-order momentum (Hessian-corrected momentum). This generalizes prior second-order momentum methods from the Euclidean norm setting to arbitrary norm settings, allowing the algorithms to adapt to the geometry of the problem.
The authors prove that their LMO-based methods with second-order momentum achieve an O(1/K^(1/3)) convergence rate in the expected gradient norm under relaxed smoothness conditions and arbitrary norms. This improves upon the O(1/K^(1/4)) rate of existing LMO-based methods with Polyak momentum and matches known rates for second-order momentum under standard smoothness in Euclidean settings.
The authors conduct experiments on nonconvex problems including MLP and LSTM training to validate their theoretical findings. The empirical results demonstrate that LMO-based methods with second-order momentum significantly outperform methods using Polyak momentum and extrapolated momentum.
Core Task Comparisons
Comparisons with papers in the same taxonomy category
Contribution Analysis
Detailed comparisons for each claimed contribution
LMO-based methods with second-order momentum under arbitrary norms
The authors extend LMO-based optimization algorithms by integrating two variants of second-order momentum (Hessian-corrected momentum). This generalizes prior second-order momentum methods from the Euclidean norm setting to arbitrary norm settings, allowing the algorithms to adapt to the geometry of the problem.
[68] Generalized momentum-based methods: A Hamiltonian perspective PDF
[69] KKT Conditions, First-Order and Second-Order Optimization, and Distributed Optimization: Tutorial and Survey PDF
[70] Implicit Bias of AdamW: Norm Constrained Optimization PDF
[71] Gradient regularization of Newton method with Bregman distances PDF
[72] Robust Low-Rank Tensor Minimization via a New Tensor Spectral -Support Norm PDF
[73] Projection based methods for conic linear programmingâoptimal first order complexities and norm constrained quasi newton methods PDF
[74] Norm constrained empirical portfolio optimization with stochastic dominance: Robust optimization non-asymptotics PDF
[75] A direct approach to secondâorder MCSCF calculations using a norm extended optimization scheme PDF
[76] An analysis of the stress formula for energy-momentum methods in nonlinear elastodynamics PDF
[77] The dynamics of matrix momentum PDF
Improved O(1/K^(1/3)) convergence rate under relaxed smoothness
The authors prove that their LMO-based methods with second-order momentum achieve an O(1/K^(1/3)) convergence rate in the expected gradient norm under relaxed smoothness conditions and arbitrary norms. This improves upon the O(1/K^(1/4)) rate of existing LMO-based methods with Polyak momentum and matches known rates for second-order momentum under standard smoothness in Euclidean settings.
[51] Convex and non-convex optimization under generalized smoothness PDF
[1] Parameter-agnostic optimization under relaxed smoothness PDF
[12] On Convergence of Adam for Stochastic Optimization under Relaxed Assumptions PDF
[13] Revisiting Convergence of AdaGrad with Relaxed Assumptions PDF
[52] Convergence of Descent Optimization Algorithms under Polyak-Åojasiewicz-Kurdyka Conditions PDF
[53] Optimal Algorithms for Stochastic Bilevel Optimization under Relaxed Smoothness Conditions PDF
[54] Convergence Rates of Inexact Proximal-Gradient Methods for Convex Optimization PDF
[55] Near-Optimal Convergence of Accelerated Gradient Methods under Generalized and -Smoothness PDF
[56] Dual Acceleration for Minimax Optimization: Linear Convergence Under Relaxed Assumptions PDF
[57] Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes PDF
Empirical validation on MLP and LSTM training tasks
The authors conduct experiments on nonconvex problems including MLP and LSTM training to validate their theoretical findings. The empirical results demonstrate that LMO-based methods with second-order momentum significantly outperform methods using Polyak momentum and extrapolated momentum.