Asymptotic analysis of shallow and deep forgetting in replay with neural collapse

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 7.3 Download Report PDF

continual larningneural collapsedeep learning

A persistent paradox in Continual Learning is that neural networks often retain linearly separable representations of past tasks even when their output predictions fail. We formalize this distinction as the gap between deep (feature-space) and shallow (classifier-level) forgetting. We demonstrate that experience replay affects these two levels asymmetrically: while even minimal buffers anchor feature geometry and prevent deep forgetting, mitigating shallow forgetting requires substantially larger buffers. To explain this, we extend the Neural Collapse framework to sequential training. We theoretically model deep forgetting as a geometric drift toward out-of-distribution subspaces, proving that replay guarantees asymptotic separability. In contrast, we show that shallow forgetting stems from an under-determined classifier optimization: the strong collapse of buffer data leads to rank-deficient covariances and inflated means, blinding the classifier to the true population boundaries. Our work unifies continual learning with OOD detection and challenges the reliance on large buffers, suggesting that explicitly correcting the statistical artifacts of Neural Collapse could unlock robust performance with minimal replay.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper formalizes the distinction between deep (feature-space) and shallow (classifier-level) forgetting in continual learning, demonstrating that replay affects these levels asymmetrically. It resides in the 'Theoretical Foundations and Asymmetric Forgetting Analysis' leaf, which contains only two papers in the entire 50-paper taxonomy. This sparse population suggests the paper addresses a relatively underexplored theoretical niche within continual learning, focusing on mechanistic analysis rather than empirical method development.

The taxonomy reveals that most continual learning research concentrates on practical techniques: replay mechanisms (11 papers across three sub-branches), generative methods (10 papers), and prototype-based approaches (8 papers). The paper's theoretical positioning contrasts sharply with these empirical branches. Its closest conceptual neighbors include classifier adaptation methods (3 papers) and representation maintenance work (3 papers), which address forgetting at specific network levels but lack the unified geometric framework proposed here. The taxonomy's domain-specific branch (13 papers) further highlights the field's applied orientation.

Among 29 candidates examined across three contributions, none clearly refuted the paper's claims. The formalization of deep versus shallow forgetting examined 9 candidates with no refutations; the Neural Collapse extension examined 10 with none; and the mechanistic explanation of shallow forgetting examined 10 with none. This absence of overlapping prior work within the limited search scope suggests the specific framing—linking Neural Collapse theory to continual learning's asymmetric forgetting—represents a novel analytical angle, though the search scale precludes definitive conclusions about the broader literature.

Based on top-29 semantic matches, the work appears to occupy a distinct theoretical position. The taxonomy structure confirms that analytical studies of forgetting mechanisms remain sparse compared to method-oriented research. However, the limited search scope means potentially relevant work in adjacent fields (e.g., neural collapse literature outside continual learning, or OOD detection theory) may not be fully captured. The novelty assessment reflects what was examined, not an exhaustive field survey.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: asymmetric effects of replay on feature space and classifier forgetting. The field of continual learning addresses how models can acquire new knowledge without catastrophically forgetting previous tasks. The taxonomy reveals a rich structure organized around several complementary themes. Replay Mechanisms and Buffer Management explores how stored examples are selected and maintained, with works like Replay Rehearsal Best Practices[2] and Adaptive Memory Replay Intrusion[6] examining storage strategies. Generative Replay and Synthetic Data investigates model-based approaches that synthesize past data using generative models, including diffusion-based methods such as Diffusion Distillation Replay[9] and Federated Diffusion Replay[49]. Prototype and Representation-Based Methods focuses on compact summaries and feature-level retention, while Classifier and Decision Boundary Adaptation tackles how decision surfaces evolve across tasks. Theoretical Foundations and Asymmetric Forgetting Analysis provides the analytical lens for understanding differential forgetting rates, and Domain-Specific Applications demonstrate these principles in specialized contexts like audio, hyperspectral imaging, and spatiotemporal prediction. A central tension runs through the literature between memory efficiency and retention fidelity: some approaches store raw examples while others rely on generative reconstruction or prototype compression. Recent work increasingly examines the interplay between representation learning and classifier adaptation, recognizing that forgetting manifests differently at these two levels. Shallow Deep Forgetting[0] sits squarely within the Theoretical Foundations branch, providing analytical insight into how replay asymmetrically mitigates forgetting in feature extractors versus classification heads. This contrasts with neighboring work like Brain Inspired Feature Exaggeration[35], which emphasizes biologically motivated mechanisms for enhancing feature discriminability. While many studies focus on empirical replay strategies or generative alternatives, Shallow Deep Forgetting[0] offers a principled decomposition of forgetting dynamics, helping to explain why replay's protective effects vary across network depth and informing more targeted mitigation strategies.

Claimed Contributions

Formalization of deep versus shallow forgetting and asymmetric replay effects

9 retrieved papers

The authors formalize the distinction between deep forgetting (loss of feature-space separability) and shallow forgetting (classifier-level degradation). They empirically demonstrate that replay buffers mitigate these two forms of forgetting at fundamentally different rates, with small buffers sufficient for deep forgetting but large buffers required for shallow forgetting.

9 retrieved papers

Extension of Neural Collapse framework to continual learning

10 retrieved papers

The authors extend the Neural Collapse theoretical framework to continual learning settings, including task-incremental, class-incremental, and domain-incremental learning. They characterize the asymptotic geometry of features and classifier heads under sequential training and prove that replay guarantees asymptotic separability in feature space.

10 retrieved papers

Mechanistic explanation of shallow forgetting via under-determined classifier optimization

10 retrieved papers

The authors demonstrate that shallow forgetting arises because Neural Collapse causes buffer data to collapse into a low-dimensional subspace, creating rank-deficient covariances and inflated means. This renders classifier optimization under-determined, causing decision boundaries to misalign with true population boundaries despite preserved feature separability.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[35] Brain-inspired feature exaggeration in generative replay for continual learning PDF

Jack Millichamp, Chen Xi, Xi Chen (2021) • arXiv.org

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Formalization of deep versus shallow forgetting and asymmetric replay effects

[51] Depth aware hierarchical replay continual learning for knowledge based question answering PDF

Cannot Refute

[52] Orchestrate latent expertise: Advancing online continual learning with multi-level supervision and reverse self-distillation PDF

Cannot Refute

[53] Pretrained language model in continual learning: A comparative study PDF

Cannot Refute

[54] Forget-free continual learning with winning subnetworks PDF

Cannot Refute

[55] Position: Continual Learning Benefits from An Evolving Population over An Unified Model PDF

Cannot Refute

[56] Incremental learning algorithm for anomaly detection applied to computed tomography scans in nuclear industry PDF

Cannot Refute

[57] Mind the Gap: Layerwise Proximal Replay for Stable Continual Learning PDF

Cannot Refute

[58] Rewiring neurons in non-stationary environments PDF

Cannot Refute

[59] Data Efficient Continual Learning of Large Language Model PDF

Cannot Refute

Contribution

Extension of Neural Collapse framework to continual learning

[60] Neural collapse inspired feature-classifier alignment for few-shot class incremental learning PDF

Cannot Refute

[61] Learning equi-angular representations for online continual learning PDF

Cannot Refute

[62] Neural collapse terminus: A unified solution for class incremental learning and its variants PDF

Cannot Refute

[63] Mitigating non-representative prototypes and representation bias in few-shot continual relation extraction PDF

Cannot Refute

[64] Learning optimal inter-class margin adaptively for few-shot class-incremental learning via neural collapse-based meta-learning PDF

Cannot Refute

[65] Compress to One Point: Neural Collapse for Pre-Trained Model-Based Class-Incremental Learning PDF

Cannot Refute

[66] Learn by Reasoning: Analogical Weight Generation for Few-Shot Class-Incremental Learning PDF

Cannot Refute

[67] Normalization and effective learning rates in reinforcement learning PDF

Cannot Refute

[68] Sequential-in-time training of nonlinear parametrizations for solving time-dependent partial differential equations PDF

Cannot Refute

[69] Memory-efficient continual learning with neural collapse contrastive PDF

Cannot Refute

Contribution

Mechanistic explanation of shallow forgetting via under-determined classifier optimization

[70] Preserving principal subspaces to reduce catastrophic forgetting in fine-tuning PDF

Cannot Refute

[71] Distribution-aware knowledge prototyping for non-exemplar lifelong person re-identification PDF

Cannot Refute

[72] Overcoming Catastrophic Forgetting for Fine-Tuning Pre-trained GANs PDF

Cannot Refute

[73] On the Implicit Adversariality of Catastrophic Forgetting in Deep Continual Learning PDF

Cannot Refute

[74] EEC: Learning to encode and regenerate images for continual learning PDF

Cannot Refute

[75] Learning to Optimize Resource Allocation in Dynamic Wireless Environments: Embracing the New While Engaging the Old PDF

Cannot Refute

[76] An Empirical Analysis of Forgetting in Pre-trained Models with Incremental Low-Rank Updates PDF

Cannot Refute

[77] Storing encoded episodes as concepts for continual learning PDF

Cannot Refute

[78] Demystifying Language Model Forgetting with Low-rank Example Associations PDF

Cannot Refute

[79] Consistent Low-Rank Adaptation of Two-layer Neural Networks: A Nonparametric Statistics Approach PDF

Cannot Refute

Asymptotic analysis of shallow and deep forgetting in replay with neural collapse

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[35] Brain-inspired feature exaggeration in generative replay for continual learning PDF

Contribution Analysis

Formalization of deep versus shallow forgetting and asymmetric replay effects

[51] Depth aware hierarchical replay continual learning for knowledge based question answering PDF

[52] Orchestrate latent expertise: Advancing online continual learning with multi-level supervision and reverse self-distillation PDF

[53] Pretrained language model in continual learning: A comparative study PDF

[54] Forget-free continual learning with winning subnetworks PDF

[55] Position: Continual Learning Benefits from An Evolving Population over An Unified Model PDF

[56] Incremental learning algorithm for anomaly detection applied to computed tomography scans in nuclear industry PDF

[57] Mind the Gap: Layerwise Proximal Replay for Stable Continual Learning PDF

[58] Rewiring neurons in non-stationary environments PDF

[59] Data Efficient Continual Learning of Large Language Model PDF

Extension of Neural Collapse framework to continual learning

[60] Neural collapse inspired feature-classifier alignment for few-shot class incremental learning PDF

[61] Learning equi-angular representations for online continual learning PDF

[62] Neural collapse terminus: A unified solution for class incremental learning and its variants PDF

[63] Mitigating non-representative prototypes and representation bias in few-shot continual relation extraction PDF

[64] Learning optimal inter-class margin adaptively for few-shot class-incremental learning via neural collapse-based meta-learning PDF

[65] Compress to One Point: Neural Collapse for Pre-Trained Model-Based Class-Incremental Learning PDF

[66] Learn by Reasoning: Analogical Weight Generation for Few-Shot Class-Incremental Learning PDF

[67] Normalization and effective learning rates in reinforcement learning PDF

[68] Sequential-in-time training of nonlinear parametrizations for solving time-dependent partial differential equations PDF

[69] Memory-efficient continual learning with neural collapse contrastive PDF

Mechanistic explanation of shallow forgetting via under-determined classifier optimization

[70] Preserving principal subspaces to reduce catastrophic forgetting in fine-tuning PDF

[71] Distribution-aware knowledge prototyping for non-exemplar lifelong person re-identification PDF

[72] Overcoming Catastrophic Forgetting for Fine-Tuning Pre-trained GANs PDF

[73] On the Implicit Adversariality of Catastrophic Forgetting in Deep Continual Learning PDF

[74] EEC: Learning to encode and regenerate images for continual learning PDF

[75] Learning to Optimize Resource Allocation in Dynamic Wireless Environments: Embracing the New While Engaging the Old PDF

[76] An Empirical Analysis of Forgetting in Pre-trained Models with Incremental Low-Rank Updates PDF

[77] Storing encoded episodes as concepts for continual learning PDF

[78] Demystifying Language Model Forgetting with Low-rank Example Associations PDF

[79] Consistent Low-Rank Adaptation of Two-layer Neural Networks: A Nonparametric Statistics Approach PDF

Table of Contents