Back to Square Roots: An Optimal Bound on the Matrix Factorization Error for Multi-Epoch Differentially Private SGD

ICLR 2026 Conference SubmissionAnonymous Authors
Matrix FactorizationDifferential PrivacyMachine Learning
Abstract:

Matrix factorization mechanisms for differentially private training have emerged as a promising approach to improve model utility under privacy constraints. In practical settings, models are typically trained over multiple epochs, requiring matrix factorizations that account for repeated participation. Existing theoretical upper and lower bounds on multi-epoch factorization error leave a significant gap. In this work, we introduce a new explicit factorization method, Banded Inverse Square Root (BISR), which imposes a banded structure on the inverse correlation matrix. This factorization enables us to derive an explicit and tight characterization of the multi-epoch error. We further prove that BISR achieves asymptotically optimal error by matching the upper and lower bounds. Empirically, BISR performs on par with the state of the art factorization methods, while being simpler to implement, computationally efficient, and easier to analyze.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces the Banded Inverse Square Root (BISR) factorization method for differentially private multi-epoch training, contributing explicit error characterization and asymptotic optimality proofs. It resides in the 'Optimal Factorization Theory and Bounds' leaf, which contains only two papers total, indicating a relatively sparse research direction focused on theoretical foundations. This leaf sits within the broader 'Core Matrix Factorization Mechanisms for DP-SGD' branch, which encompasses five distinct subtopics addressing different aspects of noise injection and privacy accounting for stochastic gradient descent.

The taxonomy reveals neighboring work in 'Unified and Amplified Factorization Frameworks' (three papers) and 'Adaptive Stream and Online Factorization' (three papers), suggesting the field has diversified into practical extensions beyond pure optimality theory. The 'Learning Rate Scheduling and Correlated Noise' leaf (two papers) addresses complementary concerns about training dynamics under privacy constraints. The scope notes clarify that this leaf excludes empirical-only comparisons and application-specific adaptations, positioning the work as foundational theory rather than domain-specific engineering. The broader taxonomy shows substantial activity in federated learning and LoRA adaptation (seventeen papers combined), indicating the field's center of gravity has shifted toward distributed and parameter-efficient settings.

Among the three contributions analyzed, the BISR factorization method itself examined ten candidates with zero refutations, suggesting novelty in its explicit banded structure. The asymptotic optimality claim examined ten candidates and found one refutable match, indicating some prior theoretical work on optimal bounds exists within the limited search scope. The BandInvMF optimization method examined seven candidates with no refutations, suggesting its low-memory regime focus may be less explored. Overall, twenty-seven candidates were examined across all contributions—a modest search scale that captures core theoretical work but may not reflect the full landscape of empirical implementations or domain-specific adaptations.

Based on the limited search scope of twenty-seven semantically similar papers, the work appears to occupy a theoretically focused niche within a field increasingly oriented toward practical federated and parameter-efficient methods. The single refutation for the optimality bounds suggests some theoretical overlap exists, though the explicit BISR construction and low-memory optimization appear less directly anticipated. The sparse population of the 'Optimal Factorization Theory' leaf (two papers) contrasts with denser application-oriented branches, suggesting this contribution addresses a foundational gap rather than competing in a crowded subfield.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
27
Contribution Candidate Papers Compared
1
Refutable Paper

Research Landscape Overview

Core task: matrix factorization for differentially private multi-epoch training. The field addresses how to inject noise efficiently when training models over multiple passes through sensitive data, balancing privacy guarantees with utility. The taxonomy reveals several main branches: Core Matrix Factorization Mechanisms for DP-SGD develop foundational noise-injection strategies and optimal factorization bounds; Low-Rank Adaptation with Differential Privacy explores parameter-efficient fine-tuning methods such as LoRA under privacy constraints; Federated Learning with Privacy-Preserving Low-Rank Methods combines distributed training with low-rank updates to reduce communication and preserve privacy; Privacy-Preserving Matrix Decomposition and SVD investigates secure computation of singular value decompositions and related factorizations; Application-Specific Privacy-Preserving Methods tailor these techniques to domains like healthcare and recommendation systems; and Surveys and Peripheral Topics provide broader context. Representative works include Multi-Epoch Matrix Factorization[11], which laid early groundwork for correlated noise across epochs, and Amplified Banded Matrix[5], which refined banded structures for tighter privacy accounting. A particularly active line of research focuses on optimal factorization theory and practical banded-matrix constructions. Square Roots Optimal[0] sits within this branch, emphasizing theoretical bounds on how to factor covariance matrices to minimize privacy loss over many training epochs. It contrasts with Banded Square Root[2], which also targets banded structures but may prioritize computational efficiency or specific sparsity patterns. Meanwhile, works such as Improving LoRA Privacy[1] and Private LoRA Federated[3] explore how low-rank adaptation can be combined with differential privacy in both centralized and federated settings, trading off rank constraints against noise overhead. Another cluster, including Hassle-free Strong Privacy[6] and Memory-Efficient Correlated Noise[37], addresses practical implementation challenges such as memory footprint and ease of deployment. Square Roots Optimal[0] contributes to the theoretical foundation by characterizing when square-root factorizations achieve optimal privacy-utility trade-offs, complementing empirical advances in banded and low-rank methods and helping to unify understanding across these diverse branches.

Claimed Contributions

Banded Inverse Square Root (BISR) factorization method

The authors propose BISR, a novel matrix factorization approach for differentially private training that enforces a banded structure on the inverse correlation matrix C^(-1) rather than on C itself. This design enables explicit error characterization, computational efficiency via convolution operations, and simpler implementation compared to existing methods.

10 retrieved papers
Asymptotically optimal multi-epoch factorization error bounds

The authors establish tight theoretical guarantees by deriving both improved lower bounds (Theorem 3) and explicit upper bounds (Theorem 4) on multi-epoch factorization error. They prove that BISR achieves the optimal asymptotic rate, closing a significant theoretical gap in the literature on matrix factorization mechanisms for differential privacy.

10 retrieved papers
Can Refute
BandInvMF optimization method for low-memory regime

The authors introduce BandInvMF, a practical optimization approach that numerically optimizes the coefficients of the banded inverse correlation matrix for scenarios with small bandwidth constraints. This method maintains the banded inverse Toeplitz structure while achieving error rates comparable to state-of-the-art factorization methods through efficient implementation.

7 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Banded Inverse Square Root (BISR) factorization method

The authors propose BISR, a novel matrix factorization approach for differentially private training that enforces a banded structure on the inverse correlation matrix C^(-1) rather than on C itself. This design enables explicit error characterization, computational efficiency via convolution operations, and simpler implementation compared to existing methods.

Contribution

Asymptotically optimal multi-epoch factorization error bounds

The authors establish tight theoretical guarantees by deriving both improved lower bounds (Theorem 3) and explicit upper bounds (Theorem 4) on multi-epoch factorization error. They prove that BISR achieves the optimal asymptotic rate, closing a significant theoretical gap in the literature on matrix factorization mechanisms for differential privacy.

Contribution

BandInvMF optimization method for low-memory regime

The authors introduce BandInvMF, a practical optimization approach that numerically optimizes the coefficients of the banded inverse correlation matrix for scenarios with small bandwidth constraints. This method maintains the banded inverse Toeplitz structure while achieving error rates comparable to state-of-the-art factorization methods through efficient implementation.