Adaptive gradient descent on Riemannian manifolds and its applications to Gaussian variational inference

ICLR 2026 Conference SubmissionAnonymous Authors
Adaptive methodRiemannian optimizationVariational Inference
Abstract:

We propose RAdaGD, a novel family of adaptive gradient descent methods on general Riemannian manifolds. RAdaGD adapts the step size parameter without line search, and includes instances that achieve a non-ergodic convergence guarantee, f(xk)f(x)O(1/k)f(x_k) - f(x_\star) \le \mathcal{O}(1/k), under local geodesic smoothness and generalized geodesic convexity. A core application of RAdaGD is Gaussian Variational Inference, where our method provides the first convergence guarantee in the absence of LL-smoothness of the target log-density, under additional technical assumptions. We also investigate the empirical performance of RAdaGD in numerical simulations and demonstrate its competitiveness in comparison to existing algorithms.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes RAdaGD, a family of adaptive gradient descent methods for Riemannian manifolds that achieve non-ergodic O(1/k) convergence under local geodesic smoothness and generalized geodesic convexity. It resides in the Deterministic Adaptive Gradient Descent leaf, which contains only three papers total, including the original work. This leaf sits within the broader Core Adaptive Gradient Methods branch, indicating a relatively sparse research direction focused on deterministic settings with rigorous convergence guarantees, as opposed to the more crowded stochastic variants.

The taxonomy reveals that the paper's immediate neighbors include Gradient Lower Bounded and Adaptive Gradient Nonnegative Curvature, both emphasizing geometric regularity conditions for convergence. The sibling category Stochastic Adaptive Gradient Methods contains six papers addressing mini-batch and Adam-like algorithms, reflecting a more active research direction. Nearby branches such as Second-Order Methods and Energy-Adaptive Methods explore alternative geometric frameworks, while Specialized Problem Formulations address bilevel and minimax settings. The deterministic leaf's scope explicitly excludes stochastic variants and variance reduction, positioning this work within a narrower but theoretically focused niche.

Among twenty candidates examined, the Gaussian Variational Inference contribution shows one refutable candidate, suggesting prior work addresses convergence without L-smoothness under certain conditions. The RAdaGD algorithm itself examined four candidates with zero refutations, indicating potential novelty in its adaptive step-size mechanism. The local geodesic smoothness framework examined ten candidates without clear refutation, though the limited search scope means broader prior work may exist. These statistics reflect a targeted semantic search rather than exhaustive coverage, so the apparent novelty should be interpreted cautiously.

Based on the limited search of twenty candidates, the work appears to occupy a sparsely populated deterministic niche within a broader field that increasingly emphasizes stochastic methods. The taxonomy structure suggests the deterministic adaptive gradient direction receives less attention than stochastic counterparts, though the search scope does not capture the full landscape of Riemannian optimization literature.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
20
Contribution Candidate Papers Compared
1
Refutable Paper

Research Landscape Overview

Core task: adaptive gradient descent on Riemannian manifolds. This field extends classical optimization to curved spaces where constraints and geometry are intrinsic to the problem. The taxonomy reveals a rich structure spanning ten major branches. Core Adaptive Gradient Methods form the algorithmic backbone, encompassing deterministic and stochastic variants that generalize Adam-style updates to manifolds, as seen in Riemannian Adaptive Methods[8] and Adaptive Stochastic Riemannian[2]. Second-Order and Trust-Region Methods leverage curvature information for faster convergence, while Energy-Adaptive and Metric-Adaptive Methods tailor the Riemannian metric itself to problem structure. Specialized Problem Formulations address bilevel optimization, minimax games, and constrained settings, whereas Acceleration and Momentum Techniques bring Nesterov-style ideas to curved spaces. Theoretical Foundations provide convergence guarantees under various geometric assumptions, and Meta-Learning and Learned Optimization explore data-driven tuning of manifold algorithms. Domain Applications range from signal processing to neural network training, supported by Computational Tools like Geoopt[23] and Rieoptax[48], with Specialized Manifolds focusing on structures such as Stiefel, Grassmann, and SPD matrices. Several active lines of work highlight key trade-offs. Deterministic methods prioritize clean convergence theory, often assuming bounded curvature or lower-bounded gradients, as in Gradient Lower Bounded[49] and Adaptive Gradient Nonnegative Curvature[1]. Stochastic and variance-reduced approaches balance sample efficiency with geometric complexity, while energy-adaptive schemes dynamically adjust metrics to problem landscapes. The original paper, Adaptive Gradient Riemannian[0], sits squarely within the deterministic adaptive gradient branch, emphasizing rigorous convergence analysis under geometric regularity conditions. Its focus contrasts with the stochastic emphasis of Adaptive Stochastic Riemannian[2] and the bilevel setting of Adaptive Bilevel Riemannian[3], yet shares common ground with Adaptive Gradient Nonnegative Curvature[1] in leveraging curvature bounds. Open questions persist around optimal step-size schedules, the interplay between metric adaptation and convergence speed, and scalability to high-dimensional manifolds encountered in modern machine learning.

Claimed Contributions

RAdaGD: Adaptive gradient descent on Riemannian manifolds

The authors introduce RAdaGD, a family of line-search-free adaptive gradient descent algorithms for Riemannian optimization. These methods automatically tune step sizes and achieve a non-ergodic convergence rate of O(1/k) under local geodesic smoothness and generalized geodesic convexity, which is claimed to be the first such rate for Riemannian adaptive methods.

4 retrieved papers
First convergence guarantee for GVI without L-smoothness

The authors apply RAdaGD to Gaussian Variational Inference and claim to provide the first algorithm with provable convergence guarantees when the target log-density is not globally L-smooth, requiring only a weaker growth condition and additional technical assumptions.

6 retrieved papers
Can Refute
Local geodesic smoothness framework for broader function classes

The authors establish that their convergence analysis relies on local geodesic smoothness rather than global L-smoothness, which broadens the class of applicable functions. They prove that every twice continuously differentiable function on a complete Riemannian manifold satisfies local geodesic smoothness.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

RAdaGD: Adaptive gradient descent on Riemannian manifolds

The authors introduce RAdaGD, a family of line-search-free adaptive gradient descent algorithms for Riemannian optimization. These methods automatically tune step sizes and achieve a non-ergodic convergence rate of O(1/k) under local geodesic smoothness and generalized geodesic convexity, which is claimed to be the first such rate for Riemannian adaptive methods.

Contribution

First convergence guarantee for GVI without L-smoothness

The authors apply RAdaGD to Gaussian Variational Inference and claim to provide the first algorithm with provable convergence guarantees when the target log-density is not globally L-smooth, requiring only a weaker growth condition and additional technical assumptions.

Contribution

Local geodesic smoothness framework for broader function classes

The authors establish that their convergence analysis relies on local geodesic smoothness rather than global L-smoothness, which broadens the class of applicable functions. They prove that every twice continuously differentiable function on a complete Riemannian manifold satisfies local geodesic smoothness.