Avoid Catastrophic Forgetting with Rank-1 Fisher from Diffusion Models

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

continual learningdiffusion modelscatastrophic forgettingimage generationelastic weight consolidationgenerative replay

Catastrophic forgetting remains a central obstacle for continual learning in neural models. Popular approaches---replay and elastic weight consolidation (EWC)---have limitations: replay requires a strong generator and is prone to distributional drift, while EWC implicitly assumes a shared optimum across tasks and typically uses a diagonal Fisher approximation. In this work, we study the gradient geometry of diffusion models, which can already produce high-quality replay data. We provide theoretical and empirical evidence that, in the low signal-to-noise ratio (SNR) regime, per-sample gradients become strongly collinear, yielding an empirical Fisher that is effectively rank-1 and aligned with the mean gradient. Leveraging this structure, we propose a rank-1 variant of EWC that is as cheap as the diagonal approximation yet captures the dominant curvature direction. We pair this penalty with a replay-based approach to encourage parameter sharing across tasks while mitigating drift. On class-incremental image generation datasets (MNIST, FashionMNIST, CIFAR-10, ImageNet-1k), our method consistently improves average FID and reduces forgetting relative to replay-only and diagonal-EWC baselines. In particular, forgetting is nearly eliminated on MNIST and FashionMNIST and is roughly halved on ImageNet-1k. These results suggest that diffusion models admit an approximately rank-1 Fisher. With a better Fisher estimate, EWC becomes a strong complement to replay: replay encourages parameter sharing across tasks, while EWC effectively constrains replay-induced drift.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes a rank-1 variant of elastic weight consolidation (EWC) for continual learning in diffusion models, grounded in theoretical and empirical analysis of gradient geometry in the low signal-to-noise ratio regime. It occupies the Fisher-Based Consolidation leaf within the Regularization and Consolidation Methods branch, which contains only two papers total. This represents a relatively sparse research direction compared to more crowded areas like Concept-Incremental Learning (six papers) or Generative Replay subcategories. The work combines this rank-1 penalty with generative distillation to balance parameter sharing and drift mitigation across sequential tasks.

The Fisher-Based Consolidation leaf sits within the broader Regularization and Consolidation Methods branch, which also includes Consistency and Stability Regularization approaches. Neighboring branches include Generative Replay Methods (with five subcategories spanning classifier-guided, federated, and audio replay) and Architectural and Structural Approaches (covering dynamic expansion and model merging). The taxonomy's scope note clarifies that this branch focuses on regularization techniques rather than replay or architectural expansion, while the exclude note distinguishes it from methods relying primarily on generative replay. The paper's hybrid approach—combining rank-1 EWC with replay-based distillation—bridges these traditionally separate methodological families.

Among twenty-four candidates examined, the analysis identified two refutable pairs for the rank-1 EWC penalty contribution, while the theoretical characterization of rank-1 Fisher structure and the combined framework showed no clear refutations across eight and nine candidates respectively. The limited search scope means these statistics reflect top-K semantic matches and citation expansion, not exhaustive coverage. The rank-1 EWC penalty appears to have more substantial prior work overlap, whereas the gradient geometry analysis and the integrated framework combining rank-1 penalty with generative distillation show stronger novelty signals within the examined candidate set.

Based on the limited literature search covering twenty-four candidates, the work appears to make meaningful contributions in a relatively sparse research direction. The theoretical analysis of gradient collinearity in diffusion models and the combined regularization-replay framework show novelty signals, though the rank-1 EWC penalty itself has identifiable prior work. The taxonomy context suggests this sits at an intersection of regularization and replay methods, potentially offering a bridge between these approaches. However, the analysis does not cover the full breadth of continual learning literature beyond the examined candidates.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: continual learning in diffusion models. The field addresses how diffusion-based generative models can learn new tasks or data distributions sequentially without catastrophically forgetting previously acquired knowledge. The taxonomy reveals several complementary strategies: Generative Replay Methods leverage synthetic samples from earlier tasks to maintain performance, while Regularization and Consolidation Methods constrain parameter updates to preserve critical knowledge. Personalization and Customization branches focus on adapting models to individual user preferences across time, and Class-Incremental Learning tackles the challenge of adding new classes incrementally. Architectural and Structural Approaches explore model design choices that inherently support continual adaptation, whereas Unlearning and Selective Forgetting enable controlled removal of specific information. Domain and Task Adaptation methods handle shifts in data distributions, Application-Specific Methods tailor solutions to particular domains, and Surveys and Overviews provide broader perspectives on the landscape. Within Regularization and Consolidation Methods, Fisher-based techniques have emerged as a prominent line of work that identifies and protects important parameters using Fisher information. Rank-1 Fisher Diffusion[0] exemplifies this approach by employing low-rank approximations to make Fisher-based consolidation computationally tractable for large diffusion models. This contrasts with replay-centric methods like DDGR[1] and Diffusion Distillation Replay[12], which generate synthetic data to rehearse old tasks, and with architectural strategies such as Dynamic Expansion Diffusion[23] that grow model capacity over time. A key trade-off across these branches involves balancing memory efficiency, computational cost, and the degree of forgetting: regularization methods like Rank-1 Fisher Diffusion[0] and EWC-Guided Replay[42] avoid storing past data but require careful tuning of consolidation strength, while replay methods offer strong empirical performance at the expense of generation overhead. Rank-1 Fisher Diffusion[0] sits naturally among Fisher-based consolidation works, emphasizing parameter-efficient protection of learned knowledge without relying on stored exemplars or generative replay.

Claimed Contributions

Theoretical and empirical characterization of rank-1 Fisher in diffusion models

8 retrieved papers

The authors prove theoretically and validate empirically that diffusion models exhibit an approximately rank-1 Fisher information matrix in low signal-to-noise ratio regimes. This occurs because per-sample gradients become collinear with their population mean, making the Fisher matrix effectively rank-1 and aligned with the mean gradient direction.

8 retrieved papers

Rank-1 EWC penalty for continual learning

Can Refute

7 retrieved papers

The authors introduce a rank-1 variant of elastic weight consolidation that exploits the discovered gradient structure in diffusion models. This penalty is computationally efficient (comparable to diagonal approximation) while capturing the dominant curvature direction, unlike the commonly used diagonal Fisher approximation which fails to capture off-diagonal curvature.

7 retrieved papers

Can Refute

Combined rank-1 EWC and generative distillation framework

9 retrieved papers

The authors develop a continual learning approach that pairs their rank-1 EWC penalty with generative distillation to encourage parameter sharing across tasks while constraining replay-induced drift. This combination addresses limitations where EWC alone struggles when task optima are disjoint and replay alone suffers from distributional shift.

9 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[42] EWC-Guided Diffusion Replay for Exemplar-Free Continual Learning in Medical Imaging PDF

Harit, Anoushka, Prew, William, Anoushka Harit, Sun Zhong-tian, William Prew, Markowetz Florian, Zhongtian Sun, F. Markowetz (2025)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Theoretical and empirical characterization of rank-1 Fisher in diffusion models

[59] Geodesic Diffusion Models for Medical Image-to-Image Generation PDF

Cannot Refute

[60] Fast quantification of uncertainty in non-linear diffusion MRI models for artifact detection and more power in group studies PDF

Cannot Refute

[61] Information theoretic approaches to sensor management PDF

Cannot Refute

[62] Space Computing Power Networks: Fundamentals and Techniques PDF

Cannot Refute

[63] Stimulus sensitivity in noisy neural systems PDF

Cannot Refute

[64] Analytical performance bounds for multi-tensor diffusion-MRI PDF

Cannot Refute

[65] Maximum a posteriori estimation of diffusion tensor parameters using a Rician noise model: why, how and but PDF

Cannot Refute

[66] Magnetic resonance spectra and statistical geometry PDF

Cannot Refute

Contribution

Rank-1 EWC penalty for continual learning

[68] Incremental task learning with incremental rank updates PDF

Can Refute

[70] Simple Structures in Deep Networks PDF

Can Refute

[6] A Comprehensive Survey on Continual Learning in Generative Models PDF

Cannot Refute

[56] Towards continual and few-shot learning in generative adversarial networks (GANs) PDF

Cannot Refute

[67] Learn more, but bother less: parameter efficient continual learning PDF

Cannot Refute

[69] Leveraging Low Rank Filters for Efficient and Knowledge-Preserving Lifelong Learning PDF

Cannot Refute

[71] Continual Learning via Low-Rank Network Updates PDF

Cannot Refute

Contribution

Combined rank-1 EWC and generative distillation framework

[6] A Comprehensive Survey on Continual Learning in Generative Models PDF

Cannot Refute

[51] Variational Data-Free Knowledge Distillation for Continual Learning PDF

Cannot Refute

[52] Recent advances of continual learning in computer vision: An overview PDF

Cannot Refute

[53] Cam-gan: Continual adaptation modules for generative adversarial networks PDF

Cannot Refute

[54] An Evaluation of Representative Samples Replay and Knowledge Distillation Regularization for SAR ATR Continual Learning PDF

Cannot Refute

[55] Lifelong CycleGAN for continual multi-task image restoration PDF

Cannot Refute

[56] Towards continual and few-shot learning in generative adversarial networks (GANs) PDF

Cannot Refute

[57] Knowledge Distillation for Ensemble Learning, Generative Modeling, and Continual Learning PDF

Cannot Refute

[58] Continual Learning for Classification Tasks PDF

Cannot Refute

Avoid Catastrophic Forgetting with Rank-1 Fisher from Diffusion Models

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[42] EWC-Guided Diffusion Replay for Exemplar-Free Continual Learning in Medical Imaging PDF

Contribution Analysis

Theoretical and empirical characterization of rank-1 Fisher in diffusion models

[59] Geodesic Diffusion Models for Medical Image-to-Image Generation PDF

[60] Fast quantification of uncertainty in non-linear diffusion MRI models for artifact detection and more power in group studies PDF

[61] Information theoretic approaches to sensor management PDF

[62] Space Computing Power Networks: Fundamentals and Techniques PDF

[63] Stimulus sensitivity in noisy neural systems PDF

[64] Analytical performance bounds for multi-tensor diffusion-MRI PDF

[65] Maximum a posteriori estimation of diffusion tensor parameters using a Rician noise model: why, how and but PDF

[66] Magnetic resonance spectra and statistical geometry PDF

Rank-1 EWC penalty for continual learning

[68] Incremental task learning with incremental rank updates PDF

[70] Simple Structures in Deep Networks PDF

[6] A Comprehensive Survey on Continual Learning in Generative Models PDF

[56] Towards continual and few-shot learning in generative adversarial networks (GANs) PDF

[67] Learn more, but bother less: parameter efficient continual learning PDF

[69] Leveraging Low Rank Filters for Efficient and Knowledge-Preserving Lifelong Learning PDF

[71] Continual Learning via Low-Rank Network Updates PDF

Combined rank-1 EWC and generative distillation framework

[6] A Comprehensive Survey on Continual Learning in Generative Models PDF

[51] Variational Data-Free Knowledge Distillation for Continual Learning PDF

[52] Recent advances of continual learning in computer vision: An overview PDF

[53] Cam-gan: Continual adaptation modules for generative adversarial networks PDF

[54] An Evaluation of Representative Samples Replay and Knowledge Distillation Regularization for SAR ATR Continual Learning PDF

[55] Lifelong CycleGAN for continual multi-task image restoration PDF

[56] Towards continual and few-shot learning in generative adversarial networks (GANs) PDF

[57] Knowledge Distillation for Ensemble Learning, Generative Modeling, and Continual Learning PDF

[58] Continual Learning for Classification Tasks PDF

Table of Contents