Avoid Catastrophic Forgetting with Rank-1 Fisher from Diffusion Models

ICLR 2026 Conference SubmissionAnonymous Authors
continual learningdiffusion modelscatastrophic forgettingimage generationelastic weight consolidationgenerative replay
Abstract:

Catastrophic forgetting remains a central obstacle for continual learning in neural models. Popular approaches---replay and elastic weight consolidation (EWC)---have limitations: replay requires a strong generator and is prone to distributional drift, while EWC implicitly assumes a shared optimum across tasks and typically uses a diagonal Fisher approximation. In this work, we study the gradient geometry of diffusion models, which can already produce high-quality replay data. We provide theoretical and empirical evidence that, in the low signal-to-noise ratio (SNR) regime, per-sample gradients become strongly collinear, yielding an empirical Fisher that is effectively rank-1 and aligned with the mean gradient. Leveraging this structure, we propose a rank-1 variant of EWC that is as cheap as the diagonal approximation yet captures the dominant curvature direction. We pair this penalty with a replay-based approach to encourage parameter sharing across tasks while mitigating drift. On class-incremental image generation datasets (MNIST, FashionMNIST, CIFAR-10, ImageNet-1k), our method consistently improves average FID and reduces forgetting relative to replay-only and diagonal-EWC baselines. In particular, forgetting is nearly eliminated on MNIST and FashionMNIST and is roughly halved on ImageNet-1k. These results suggest that diffusion models admit an approximately rank-1 Fisher. With a better Fisher estimate, EWC becomes a strong complement to replay: replay encourages parameter sharing across tasks, while EWC effectively constrains replay-induced drift.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes a rank-1 variant of elastic weight consolidation (EWC) for continual learning in diffusion models, grounded in theoretical and empirical analysis of gradient geometry in the low signal-to-noise ratio regime. It occupies the Fisher-Based Consolidation leaf within the Regularization and Consolidation Methods branch, which contains only two papers total. This represents a relatively sparse research direction compared to more crowded areas like Concept-Incremental Learning (six papers) or Generative Replay subcategories. The work combines this rank-1 penalty with generative distillation to balance parameter sharing and drift mitigation across sequential tasks.

The Fisher-Based Consolidation leaf sits within the broader Regularization and Consolidation Methods branch, which also includes Consistency and Stability Regularization approaches. Neighboring branches include Generative Replay Methods (with five subcategories spanning classifier-guided, federated, and audio replay) and Architectural and Structural Approaches (covering dynamic expansion and model merging). The taxonomy's scope note clarifies that this branch focuses on regularization techniques rather than replay or architectural expansion, while the exclude note distinguishes it from methods relying primarily on generative replay. The paper's hybrid approach—combining rank-1 EWC with replay-based distillation—bridges these traditionally separate methodological families.

Among twenty-four candidates examined, the analysis identified two refutable pairs for the rank-1 EWC penalty contribution, while the theoretical characterization of rank-1 Fisher structure and the combined framework showed no clear refutations across eight and nine candidates respectively. The limited search scope means these statistics reflect top-K semantic matches and citation expansion, not exhaustive coverage. The rank-1 EWC penalty appears to have more substantial prior work overlap, whereas the gradient geometry analysis and the integrated framework combining rank-1 penalty with generative distillation show stronger novelty signals within the examined candidate set.

Based on the limited literature search covering twenty-four candidates, the work appears to make meaningful contributions in a relatively sparse research direction. The theoretical analysis of gradient collinearity in diffusion models and the combined regularization-replay framework show novelty signals, though the rank-1 EWC penalty itself has identifiable prior work. The taxonomy context suggests this sits at an intersection of regularization and replay methods, potentially offering a bridge between these approaches. However, the analysis does not cover the full breadth of continual learning literature beyond the examined candidates.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
24
Contribution Candidate Papers Compared
2
Refutable Paper

Research Landscape Overview

Core task: continual learning in diffusion models. The field addresses how diffusion-based generative models can learn new tasks or data distributions sequentially without catastrophically forgetting previously acquired knowledge. The taxonomy reveals several complementary strategies: Generative Replay Methods leverage synthetic samples from earlier tasks to maintain performance, while Regularization and Consolidation Methods constrain parameter updates to preserve critical knowledge. Personalization and Customization branches focus on adapting models to individual user preferences across time, and Class-Incremental Learning tackles the challenge of adding new classes incrementally. Architectural and Structural Approaches explore model design choices that inherently support continual adaptation, whereas Unlearning and Selective Forgetting enable controlled removal of specific information. Domain and Task Adaptation methods handle shifts in data distributions, Application-Specific Methods tailor solutions to particular domains, and Surveys and Overviews provide broader perspectives on the landscape. Within Regularization and Consolidation Methods, Fisher-based techniques have emerged as a prominent line of work that identifies and protects important parameters using Fisher information. Rank-1 Fisher Diffusion[0] exemplifies this approach by employing low-rank approximations to make Fisher-based consolidation computationally tractable for large diffusion models. This contrasts with replay-centric methods like DDGR[1] and Diffusion Distillation Replay[12], which generate synthetic data to rehearse old tasks, and with architectural strategies such as Dynamic Expansion Diffusion[23] that grow model capacity over time. A key trade-off across these branches involves balancing memory efficiency, computational cost, and the degree of forgetting: regularization methods like Rank-1 Fisher Diffusion[0] and EWC-Guided Replay[42] avoid storing past data but require careful tuning of consolidation strength, while replay methods offer strong empirical performance at the expense of generation overhead. Rank-1 Fisher Diffusion[0] sits naturally among Fisher-based consolidation works, emphasizing parameter-efficient protection of learned knowledge without relying on stored exemplars or generative replay.

Claimed Contributions

Theoretical and empirical characterization of rank-1 Fisher in diffusion models

The authors prove theoretically and validate empirically that diffusion models exhibit an approximately rank-1 Fisher information matrix in low signal-to-noise ratio regimes. This occurs because per-sample gradients become collinear with their population mean, making the Fisher matrix effectively rank-1 and aligned with the mean gradient direction.

8 retrieved papers
Rank-1 EWC penalty for continual learning

The authors introduce a rank-1 variant of elastic weight consolidation that exploits the discovered gradient structure in diffusion models. This penalty is computationally efficient (comparable to diagonal approximation) while capturing the dominant curvature direction, unlike the commonly used diagonal Fisher approximation which fails to capture off-diagonal curvature.

7 retrieved papers
Can Refute
Combined rank-1 EWC and generative distillation framework

The authors develop a continual learning approach that pairs their rank-1 EWC penalty with generative distillation to encourage parameter sharing across tasks while constraining replay-induced drift. This combination addresses limitations where EWC alone struggles when task optima are disjoint and replay alone suffers from distributional shift.

9 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Theoretical and empirical characterization of rank-1 Fisher in diffusion models

The authors prove theoretically and validate empirically that diffusion models exhibit an approximately rank-1 Fisher information matrix in low signal-to-noise ratio regimes. This occurs because per-sample gradients become collinear with their population mean, making the Fisher matrix effectively rank-1 and aligned with the mean gradient direction.

Contribution

Rank-1 EWC penalty for continual learning

The authors introduce a rank-1 variant of elastic weight consolidation that exploits the discovered gradient structure in diffusion models. This penalty is computationally efficient (comparable to diagonal approximation) while capturing the dominant curvature direction, unlike the commonly used diagonal Fisher approximation which fails to capture off-diagonal curvature.

Contribution

Combined rank-1 EWC and generative distillation framework

The authors develop a continual learning approach that pairs their rank-1 EWC penalty with generative distillation to encourage parameter sharing across tasks while constraining replay-induced drift. This combination addresses limitations where EWC alone struggles when task optima are disjoint and replay alone suffers from distributional shift.

Avoid Catastrophic Forgetting with Rank-1 Fisher from Diffusion Models | Novelty Validation