Adaptive gradient descent on Riemannian manifolds and its applications to Gaussian variational inference

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.4 Download Report PDF

Adaptive methodRiemannian optimizationVariational Inference

We propose RAdaGD, a novel family of adaptive gradient descent methods on general Riemannian manifolds. RAdaGD adapts the step size parameter without line search, and includes instances that achieve a non-ergodic convergence guarantee, $f(x_k) - f(x_\star) \le \mathcal{O}(1/k)$ , under local geodesic smoothness and generalized geodesic convexity. A core application of RAdaGD is Gaussian Variational Inference, where our method provides the first convergence guarantee in the absence of $L$ -smoothness of the target log-density, under additional technical assumptions. We also investigate the empirical performance of RAdaGD in numerical simulations and demonstrate its competitiveness in comparison to existing algorithms.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes RAdaGD, a family of adaptive gradient descent methods for Riemannian manifolds that achieve non-ergodic O(1/k) convergence under local geodesic smoothness and generalized geodesic convexity. It resides in the Deterministic Adaptive Gradient Descent leaf, which contains only three papers total, including the original work. This leaf sits within the broader Core Adaptive Gradient Methods branch, indicating a relatively sparse research direction focused on deterministic settings with rigorous convergence guarantees, as opposed to the more crowded stochastic variants.

The taxonomy reveals that the paper's immediate neighbors include Gradient Lower Bounded and Adaptive Gradient Nonnegative Curvature, both emphasizing geometric regularity conditions for convergence. The sibling category Stochastic Adaptive Gradient Methods contains six papers addressing mini-batch and Adam-like algorithms, reflecting a more active research direction. Nearby branches such as Second-Order Methods and Energy-Adaptive Methods explore alternative geometric frameworks, while Specialized Problem Formulations address bilevel and minimax settings. The deterministic leaf's scope explicitly excludes stochastic variants and variance reduction, positioning this work within a narrower but theoretically focused niche.

Among twenty candidates examined, the Gaussian Variational Inference contribution shows one refutable candidate, suggesting prior work addresses convergence without L-smoothness under certain conditions. The RAdaGD algorithm itself examined four candidates with zero refutations, indicating potential novelty in its adaptive step-size mechanism. The local geodesic smoothness framework examined ten candidates without clear refutation, though the limited search scope means broader prior work may exist. These statistics reflect a targeted semantic search rather than exhaustive coverage, so the apparent novelty should be interpreted cautiously.

Based on the limited search of twenty candidates, the work appears to occupy a sparsely populated deterministic niche within a broader field that increasingly emphasizes stochastic methods. The taxonomy structure suggests the deterministic adaptive gradient direction receives less attention than stochastic counterparts, though the search scope does not capture the full landscape of Riemannian optimization literature.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: adaptive gradient descent on Riemannian manifolds. This field extends classical optimization to curved spaces where constraints and geometry are intrinsic to the problem. The taxonomy reveals a rich structure spanning ten major branches. Core Adaptive Gradient Methods form the algorithmic backbone, encompassing deterministic and stochastic variants that generalize Adam-style updates to manifolds, as seen in Riemannian Adaptive Methods[8] and Adaptive Stochastic Riemannian[2]. Second-Order and Trust-Region Methods leverage curvature information for faster convergence, while Energy-Adaptive and Metric-Adaptive Methods tailor the Riemannian metric itself to problem structure. Specialized Problem Formulations address bilevel optimization, minimax games, and constrained settings, whereas Acceleration and Momentum Techniques bring Nesterov-style ideas to curved spaces. Theoretical Foundations provide convergence guarantees under various geometric assumptions, and Meta-Learning and Learned Optimization explore data-driven tuning of manifold algorithms. Domain Applications range from signal processing to neural network training, supported by Computational Tools like Geoopt[23] and Rieoptax[48], with Specialized Manifolds focusing on structures such as Stiefel, Grassmann, and SPD matrices. Several active lines of work highlight key trade-offs. Deterministic methods prioritize clean convergence theory, often assuming bounded curvature or lower-bounded gradients, as in Gradient Lower Bounded[49] and Adaptive Gradient Nonnegative Curvature[1]. Stochastic and variance-reduced approaches balance sample efficiency with geometric complexity, while energy-adaptive schemes dynamically adjust metrics to problem landscapes. The original paper, Adaptive Gradient Riemannian[0], sits squarely within the deterministic adaptive gradient branch, emphasizing rigorous convergence analysis under geometric regularity conditions. Its focus contrasts with the stochastic emphasis of Adaptive Stochastic Riemannian[2] and the bilevel setting of Adaptive Bilevel Riemannian[3], yet shares common ground with Adaptive Gradient Nonnegative Curvature[1] in leveraging curvature bounds. Open questions persist around optimal step-size schedules, the interplay between metric adaptation and convergence speed, and scalability to high-dimensional manifolds encountered in modern machine learning.

Claimed Contributions

RAdaGD: Adaptive gradient descent on Riemannian manifolds

4 retrieved papers

The authors introduce RAdaGD, a family of line-search-free adaptive gradient descent algorithms for Riemannian optimization. These methods automatically tune step sizes and achieve a non-ergodic convergence rate of O(1/k) under local geodesic smoothness and generalized geodesic convexity, which is claimed to be the first such rate for Riemannian adaptive methods.

4 retrieved papers

First convergence guarantee for GVI without L-smoothness

Can Refute

6 retrieved papers

The authors apply RAdaGD to Gaussian Variational Inference and claim to provide the first algorithm with provable convergence guarantees when the target log-density is not globally L-smooth, requiring only a weaker growth condition and additional technical assumptions.

6 retrieved papers

Can Refute

Local geodesic smoothness framework for broader function classes

10 retrieved papers

The authors establish that their convergence analysis relies on local geodesic smoothness rather than global L-smoothness, which broadens the class of applicable functions. They prove that every twice continuously differentiable function on a complete Riemannian manifold satisfies local geodesic smoothness.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[1] Adaptive Gradient Descent on Riemannian Manifolds with Nonnegative Curvature PDF

Malitsky, Yura, Aban Ansari-Onnestam, Yura Malitsky (2025)

[49] Gradient method for optimization on Riemannian manifolds with lower bounded curvature PDF

Orizon P. Ferreira, O. P. Ferreira, L. F. Prudente, M. S. Louzeiro (2019)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

RAdaGD: Adaptive gradient descent on Riemannian manifolds

[30] Riemannian SVRG: Fast Stochastic Optimization on Riemannian Manifolds PDF

Cannot Refute

[40] Adaptive Preconditioned Gradient Descent with Energy PDF

Cannot Refute

[57] A Riemannian Optimization Perspective of the Gauss-Newton Method for Feedforward Neural Networks PDF

Cannot Refute

[65] Generalized Steepest Descent Methods on Riemannian Manifolds and Hilbert Spaces: Convergence Analysis and Stochastic Extensions PDF

Cannot Refute

Contribution

First convergence guarantee for GVI without L-smoothness

[59] Provable convergence guarantees for black-box variational inference PDF

Can Refute

[60] Forward-backward Gaussian variational inference via JKO in the Bures-Wasserstein Space PDF

Cannot Refute

[61] Towards Understanding the Dynamics of Gaussian-Stein Variational Gradient Descent PDF

Cannot Refute

[62] The computational asymptotics of Gaussian variational inference and the Laplace approximation PDF

Cannot Refute

[63] Decoupled variational Gaussian inference PDF

Cannot Refute

[64] Validated Variational Inference via Practical Posterior Error Bounds PDF

Cannot Refute

Contribution

Local geodesic smoothness framework for broader function classes

[8] Riemannian adaptive optimization methods PDF

Cannot Refute

[30] Riemannian SVRG: Fast Stochastic Optimization on Riemannian Manifolds PDF

Cannot Refute

[51] Riemannian sam: Sharpness-aware minimization on Riemannian manifolds PDF

Cannot Refute

[52] An Introduction to Optimization on Smooth Manifolds PDF

Cannot Refute

[53] Karush-Kuhn-Tucker optimality conditions for non-smooth geodesic quasi-convex optimization on Riemannian manifolds PDF

Cannot Refute

[54] Zeroth-order Optimization on Riemannian Manifolds PDF

Cannot Refute

[55] Differentially private Riemannian optimization PDF

Cannot Refute

[56] Stochastic Zeroth-Order Riemannian Derivative Estimation and Optimization PDF

Cannot Refute

[57] A Riemannian Optimization Perspective of the Gauss-Newton Method for Feedforward Neural Networks PDF

Cannot Refute

[58] Nonconvex Matrix Factorization is Geodesically Convex: Global Landscape Analysis for Fixed-rank Matrix Optimization From a Riemannian Perspective PDF

Cannot Refute

Adaptive gradient descent on Riemannian manifolds and its applications to Gaussian variational inference

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[1] Adaptive Gradient Descent on Riemannian Manifolds with Nonnegative Curvature PDF

[49] Gradient method for optimization on Riemannian manifolds with lower bounded curvature PDF

Contribution Analysis

RAdaGD: Adaptive gradient descent on Riemannian manifolds

[30] Riemannian SVRG: Fast Stochastic Optimization on Riemannian Manifolds PDF

[40] Adaptive Preconditioned Gradient Descent with Energy PDF

[57] A Riemannian Optimization Perspective of the Gauss-Newton Method for Feedforward Neural Networks PDF

[65] Generalized Steepest Descent Methods on Riemannian Manifolds and Hilbert Spaces: Convergence Analysis and Stochastic Extensions PDF

First convergence guarantee for GVI without L-smoothness

[59] Provable convergence guarantees for black-box variational inference PDF

[60] Forward-backward Gaussian variational inference via JKO in the Bures-Wasserstein Space PDF

[61] Towards Understanding the Dynamics of Gaussian-Stein Variational Gradient Descent PDF

[62] The computational asymptotics of Gaussian variational inference and the Laplace approximation PDF

[63] Decoupled variational Gaussian inference PDF

[64] Validated Variational Inference via Practical Posterior Error Bounds PDF

Local geodesic smoothness framework for broader function classes

[8] Riemannian adaptive optimization methods PDF

[30] Riemannian SVRG: Fast Stochastic Optimization on Riemannian Manifolds PDF

[51] Riemannian sam: Sharpness-aware minimization on Riemannian manifolds PDF

[52] An Introduction to Optimization on Smooth Manifolds PDF

[53] Karush-Kuhn-Tucker optimality conditions for non-smooth geodesic quasi-convex optimization on Riemannian manifolds PDF

[54] Zeroth-order Optimization on Riemannian Manifolds PDF

[55] Differentially private Riemannian optimization PDF

[56] Stochastic Zeroth-Order Riemannian Derivative Estimation and Optimization PDF

[57] A Riemannian Optimization Perspective of the Gauss-Newton Method for Feedforward Neural Networks PDF

[58] Nonconvex Matrix Factorization is Geodesically Convex: Global Landscape Analysis for Fixed-rank Matrix Optimization From a Riemannian Perspective PDF

Table of Contents