Converge Faster, Talk Less: Hessian-Informed Federated Zeroth-Order Optimization

ICLR 2026 Conference SubmissionAnonymous Authors
Zeroth-Order OptimizationFederated OptimizationHessian
Abstract:

Zeroth-order (ZO) optimization enables dimension-free communication in federated learning (FL), making it attractive for fine-tuning of large language models (LLMs) due to significant communication savings. However, existing ZO-FL methods largely overlook curvature information, despite its well-established benefits for convergence acceleration. To address this, we propose HiSo, a Hessian-informed ZO federated optimization method that accelerates convergence by leveraging global diagonal Hessian approximations, while strictly preserving scalar-only communication without transmitting any second-order information. Theoretically, for non-convex functions, we show that HiSo can achieve an accelerated convergence rate that is independent of the Lipschitz constant LL and model dimension dd under some Hessian approximation assumptions, offering a plausible explanation for the observed phenomenon of ZO convergence being much faster than its worst-case O(d)O(d)-bound. Empirically, across diverse LLM fine-tuning benchmarks, HiSo delivers a 1\sim5× speedup in communication rounds over existing state-of-the-art ZO-FL baselines. This superior convergence not only cuts communication costs but also provides strong empirical evidence that Hessian information acts as an effective accelerator in federated ZO optimization settings.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes HiSo, a federated zeroth-order optimization method that leverages diagonal Hessian approximations to accelerate convergence while maintaining scalar-only communication. It occupies the 'Scalar-Only Communication Frameworks with Hessian Acceleration' leaf, which contains only three papers including this work. This represents a sparse research direction within the broader taxonomy of 23 papers, suggesting the intersection of dimension-free communication and Hessian-informed federated zeroth-order methods remains relatively unexplored compared to adjacent areas like centralized Hessian-aware methods or distributed consensus algorithms.

The taxonomy reveals neighboring directions including incremental Hessian estimation for federated zeroth-order optimization and Hessian approximation methods using compression or sketching. The original paper diverges from these by avoiding any second-order information transmission, contrasting with approaches like Hessian-weighted aggregation or eigenvector sharing that require richer communication primitives. Its closest structural neighbors are centralized Hessian-aware zeroth-order methods, which achieve similar convergence benefits but lack the federated communication constraints. The taxonomy boundaries clarify that HiSo sits at the intersection of federated learning efficiency demands and curvature exploitation, distinct from pure gradient-free methods without second-order awareness.

Among the three analyzed contributions, the core HiSo algorithm examined ten candidate papers, with two appearing to provide overlapping prior work. The dimension-independent convergence rate contribution examined eight candidates, with one potentially refuting its novelty claims. The generalized scalar-only communication framework examined only two candidates with no clear refutations. Given the limited search scope of twenty total candidates examined, these statistics suggest the HiSo algorithm and convergence analysis face more substantial prior work overlap than the communication framework abstraction. The small candidate pool indicates these findings reflect top-semantic-match proximity rather than exhaustive field coverage.

Based on examination of twenty semantically related papers, the work appears to occupy a relatively sparse research niche at the intersection of federated learning, zeroth-order optimization, and Hessian acceleration. The scalar-only communication framework shows less prior work overlap, while the algorithmic and theoretical contributions encounter more existing research. The analysis provides initial context but cannot definitively assess novelty given the constrained search scope and the field's evolving nature around communication-efficient federated optimization for large models.

Taxonomy

Core-task Taxonomy Papers
23
3
Claimed Contributions
20
Contribution Candidate Papers Compared
3
Refutable Paper

Research Landscape Overview

Core task: Hessian-informed zeroth-order optimization in federated learning. The field combines gradient-free optimization with distributed learning, structured around four main branches. Federated Learning with Hessian-Informed Zeroth-Order Methods addresses privacy-preserving distributed training where clients cannot share gradients but can exploit curvature information, with works exploring scalar-only communication frameworks and Hessian-accelerated aggregation strategies like Hessian Weighted Aggregation[5]. Centralized Hessian-Aware Zeroth-Order Methods focus on single-machine settings where Hessian approximations improve convergence, exemplified by Hessian Aware Zeroth[1] and techniques for flat minima discovery such as Zeroth Order Flat Minima[2]. Distributed and Multi-Agent Zeroth-Order Optimization tackles coordination challenges in multi-agent systems, including consensus-based approaches like Zeroth Proximal Consensus[17]. Specialized Applications and Extensions cover domain-specific adaptations, from large language model fine-tuning in Hessian Zeroth LLM[7] to adversarial robustness in Hessian Adversarial Attack[23]. Recent activity centers on communication-efficient federated schemes and scalable Hessian approximations. A key tension emerges between methods that transmit full curvature information versus those achieving extreme communication reduction through scalar exchanges, as seen in Flecs[3] and HiSo[16]. Another active direction involves low-rank and subspace techniques like Low Rank Hessian[11] and Subspace Hessian Zeroth[13] that balance computational cost with second-order benefits. Hessian Federated Zeroth[0] sits within the scalar-only communication cluster, closely aligned with Hessian Scalar Communication[15] and HiSo[16], emphasizing minimal bandwidth overhead while preserving curvature-guided convergence. Compared to neighbors, it appears to prioritize practical federated deployment constraints over the richer but more communication-intensive strategies explored in works like Hessian Eigenvector Sharing[14], reflecting ongoing debates about the optimal trade-off between communication efficiency and convergence acceleration in privacy-sensitive distributed settings.

Claimed Contributions

Generalized scalar-only communication FL framework

The authors introduce a generalized federated learning framework that decouples scalar-only communication from vanilla ZO-SGD, enabling integration of more sophisticated optimization algorithms while maintaining dimension-free communication. This framework extends beyond the limitations of prior work (DeComFL) by supporting various optimization techniques.

2 retrieved papers
HiSo algorithm for Hessian-informed federated ZO optimization

The authors propose HiSo, a novel federated optimization method that leverages global diagonal Hessian approximations to accelerate convergence while strictly preserving scalar-only communication. The method captures curvature information without transmitting any Hessian-related data, achieving significant speedups over existing ZO-FL baselines.

10 retrieved papers
Can Refute
Dimension-independent convergence rate for non-convex federated ZO optimization

The authors establish theoretical convergence guarantees showing that HiSo achieves a rate independent of both model dimension d and Lipschitz constant L under Hessian approximation assumptions. This represents the first dimension-independent convergence result for zeroth-order methods in federated learning and extends theoretical guarantees to multiple local updates.

8 retrieved papers
Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Generalized scalar-only communication FL framework

The authors introduce a generalized federated learning framework that decouples scalar-only communication from vanilla ZO-SGD, enabling integration of more sophisticated optimization algorithms while maintaining dimension-free communication. This framework extends beyond the limitations of prior work (DeComFL) by supporting various optimization techniques.

Contribution

HiSo algorithm for Hessian-informed federated ZO optimization

The authors propose HiSo, a novel federated optimization method that leverages global diagonal Hessian approximations to accelerate convergence while strictly preserving scalar-only communication. The method captures curvature information without transmitting any Hessian-related data, achieving significant speedups over existing ZO-FL baselines.

Contribution

Dimension-independent convergence rate for non-convex federated ZO optimization

The authors establish theoretical convergence guarantees showing that HiSo achieves a rate independent of both model dimension d and Lipschitz constant L under Hessian approximation assumptions. This represents the first dimension-independent convergence result for zeroth-order methods in federated learning and extends theoretical guarantees to multiple local updates.