The Gaussian-Head OFL Family: One-Shot Federated Learning from Client Global Statistics

ICLR 2026 Conference SubmissionAnonymous Authors
One-Shot Federated Learningdata-free aggregationGaussian Discriminant HeadsKnowledge Distillation
Abstract:

Classical Federated Learning relies on a multi-round iterative process of model exchange and aggregation between server and clients, with high communication costs and privacy risks from repeated model transmissions. In contrast, one-shot federated learning (OFL) alleviates these limitations by reducing communication to a single round, thereby lowering overhead and enhancing practical deployability. Nevertheless, most existing one-shot approaches remain either impractical or constrained, for example, they often depend on the availability of a public dataset, assume homogeneous client models, or require uploading additional data or model information. To overcome these issues, we introduce the Gaussian-Head OFL (GH-OFL) family, a suite of one-shot federated methods that assume class-conditional Gaussianity of pretrained embeddings. Clients transmit only sufficient statistics (per-class counts and first/second-order moments) and the server builds heads via three components: (i) Closed-form Gaussian heads (NB/LDA/QDA) computed directly from the received statistics; (ii) FisherMix, a linear head with cosine margin trained on synthetic samples drawn in an estimated Fisher subspace; and (iii) Proto-Hyper, a lightweight low-rank residual head that refines Gaussian logits via knowledge distillation on those synthetic samples. In our experiments, GH-OFL methods deliver state-of-the-art robustness and accuracy under strong non-IID skew while remaining strictly data-free.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces the Gaussian-Head OFL (GH-OFL) family, which aggregates per-class sufficient statistics (counts, means, covariances) from clients to construct classification heads in a single communication round. According to the taxonomy, this work resides in the 'Statistical Aggregation from Client Summaries' leaf, which contains four papers total. This leaf sits within the broader 'Core One-Shot Aggregation Mechanisms' branch, indicating a moderately populated research direction focused on fundamental aggregation strategies rather than domain-specific or privacy-centric extensions.

The taxonomy reveals neighboring leaves addressing alternative aggregation paradigms: 'Model Parameter and Ensemble Aggregation' (three papers) directly combines trained parameters, while 'Knowledge Distillation-Based Aggregation' (three papers) uses synthetic data and distillation. The scope note for the current leaf explicitly excludes methods using gradients or distillation, positioning GH-OFL's statistical approach as distinct from these parameter-centric or distillation-driven strategies. Nearby branches cover domain-specific adaptations (graph data, medical imaging) and system-level concerns (hierarchical architectures, secure aggregation), suggesting the core aggregation space remains less crowded than privacy or application-focused areas.

Among the three contributions analyzed, the overall GH-OFL framework examined five candidates and found one refutable match, indicating some prior work in statistical one-shot aggregation exists within the limited search scope. The closed-form Gaussian heads contribution examined ten candidates with zero refutations, suggesting this specific technique may be less directly anticipated in the surveyed literature. FisherMix and Proto-Hyper heads were not examined against any candidates, leaving their novelty unassessed. These statistics reflect a top-15 semantic search, not an exhaustive review, so additional overlapping work may exist beyond the examined set.

Given the limited search scope (15 candidates total), the analysis suggests moderate novelty: the paper occupies a sparsely populated taxonomy leaf and most contributions show minimal direct refutation among examined papers. However, the presence of one refutable match for the core framework indicates that statistical aggregation from client summaries is an established direction. A broader literature search or citation network analysis would be needed to confirm whether the Gaussian-head formulation and FisherMix training represent substantive advances over existing statistical one-shot methods.

Taxonomy

Core-task Taxonomy Papers
26
3
Claimed Contributions
15
Contribution Candidate Papers Compared
1
Refutable Paper

Research Landscape Overview

Core task: one-shot federated learning from client global statistics. This emerging paradigm seeks to train or aggregate models by collecting summary statistics from distributed clients in a single communication round, thereby minimizing overhead and latency. The field organizes into several main branches. Core One-Shot Aggregation Mechanisms explore fundamental techniques for combining client-side summaries—ranging from statistical aggregation methods like Turbo-aggregate[6] and ensemble-based approaches such as Ensemble Optimization[1], to Bayesian frameworks exemplified by Calibrated Bayesian[7]. Domain-Specific One-Shot Federated Learning adapts these ideas to specialized settings, including graph-based tasks (Federated Graph One-Shot[2], FedLPA[3], Personalized Graph Learning[4]) and vision or language domains that leverage pretrained models (Pretrained Models Role[15], Global Prompt Refinement[24]). System Architecture and Communication Efficiency addresses practical deployment concerns such as dropout resilience (Dropout-Resilient[22]), scalability (Scalable Analytic[19]), and edge-AI integration (FLEdge-AI[26]). Privacy and Security in One-Shot Federated Learning investigates differential privacy guarantees (One-Shot Private Aggregation[10], OPSA[9]), long-term privacy risks (Long-term Privacy[16]), and robustness against Byzantine attacks (UAV Byzantine[13]). Recent work highlights a tension between statistical fidelity and communication constraints: some methods prioritize expressive client representations to preserve heterogeneity, while others emphasize minimal bandwidth and privacy budgets. Within the statistical aggregation cluster, Gaussian-Head OFL[0] focuses on leveraging Gaussian-based head statistics for efficient one-shot fusion, positioning itself alongside Global Feature Statistics[12] and Pretrained Models Role[15], which similarly exploit feature-level summaries or pretrained embeddings to reduce the burden of raw data exchange. Compared to MOHFL[5], which targets multi-objective optimization in heterogeneous settings, Gaussian-Head OFL[0] emphasizes a streamlined aggregation of distributional parameters. This line of inquiry remains active, with open questions around how best to balance model expressiveness, privacy overhead, and robustness to non-IID data when only a single round of client communication is permitted.

Claimed Contributions

Gaussian-Head OFL (GH-OFL) family of one-shot federated learning methods

The authors propose a family of one-shot federated learning methods where clients transmit only sufficient statistics (per-class counts and first/second-order moments) and the server builds classification heads without requiring public datasets, homogeneous client models, or additional data uploads. The approach assumes class-conditional Gaussian distributions of pretrained embeddings.

5 retrieved papers
Can Refute
Closed-form Gaussian heads computed from client statistics

The authors develop closed-form discriminant heads (Naive Bayes, Linear Discriminant Analysis, and Quadratic Discriminant Analysis) that are instantiated directly from aggregated client statistics. These heads incorporate Fisher-guided pipelines with targeted shrinkage and compressed random-projection sketches while remaining strictly data-free.

10 retrieved papers
FisherMix and Proto-Hyper trainable heads using synthetic samples

The authors introduce two trainable head architectures: FisherMix (a cosine-margin linear head) and Proto-Hyper (a low-rank residual head). Both are trained exclusively on synthetic samples generated in a Fisher subspace using only the aggregated statistics, without requiring any public dataset or real client data.

0 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Gaussian-Head OFL (GH-OFL) family of one-shot federated learning methods

The authors propose a family of one-shot federated learning methods where clients transmit only sufficient statistics (per-class counts and first/second-order moments) and the server builds classification heads without requiring public datasets, homogeneous client models, or additional data uploads. The approach assumes class-conditional Gaussian distributions of pretrained embeddings.

Contribution

Closed-form Gaussian heads computed from client statistics

The authors develop closed-form discriminant heads (Naive Bayes, Linear Discriminant Analysis, and Quadratic Discriminant Analysis) that are instantiated directly from aggregated client statistics. These heads incorporate Fisher-guided pipelines with targeted shrinkage and compressed random-projection sketches while remaining strictly data-free.

Contribution

FisherMix and Proto-Hyper trainable heads using synthetic samples

The authors introduce two trainable head architectures: FisherMix (a cosine-margin linear head) and Proto-Hyper (a low-rank residual head). Both are trained exclusively on synthetic samples generated in a Fisher subspace using only the aggregated statistics, without requiring any public dataset or real client data.