Asymptotic analysis of shallow and deep forgetting in replay with neural collapse

ICLR 2026 Conference SubmissionAnonymous Authors
continual larningneural collapsedeep learning
Abstract:

A persistent paradox in Continual Learning is that neural networks often retain linearly separable representations of past tasks even when their output predictions fail. We formalize this distinction as the gap between deep (feature-space) and shallow (classifier-level) forgetting. We demonstrate that experience replay affects these two levels asymmetrically: while even minimal buffers anchor feature geometry and prevent deep forgetting, mitigating shallow forgetting requires substantially larger buffers. To explain this, we extend the Neural Collapse framework to sequential training. We theoretically model deep forgetting as a geometric drift toward out-of-distribution subspaces, proving that replay guarantees asymptotic separability. In contrast, we show that shallow forgetting stems from an under-determined classifier optimization: the strong collapse of buffer data leads to rank-deficient covariances and inflated means, blinding the classifier to the true population boundaries. Our work unifies continual learning with OOD detection and challenges the reliance on large buffers, suggesting that explicitly correcting the statistical artifacts of Neural Collapse could unlock robust performance with minimal replay.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper formalizes the distinction between deep (feature-space) and shallow (classifier-level) forgetting in continual learning, demonstrating that replay affects these levels asymmetrically. It resides in the 'Theoretical Foundations and Asymmetric Forgetting Analysis' leaf, which contains only two papers in the entire 50-paper taxonomy. This sparse population suggests the paper addresses a relatively underexplored theoretical niche within continual learning, focusing on mechanistic analysis rather than empirical method development.

The taxonomy reveals that most continual learning research concentrates on practical techniques: replay mechanisms (11 papers across three sub-branches), generative methods (10 papers), and prototype-based approaches (8 papers). The paper's theoretical positioning contrasts sharply with these empirical branches. Its closest conceptual neighbors include classifier adaptation methods (3 papers) and representation maintenance work (3 papers), which address forgetting at specific network levels but lack the unified geometric framework proposed here. The taxonomy's domain-specific branch (13 papers) further highlights the field's applied orientation.

Among 29 candidates examined across three contributions, none clearly refuted the paper's claims. The formalization of deep versus shallow forgetting examined 9 candidates with no refutations; the Neural Collapse extension examined 10 with none; and the mechanistic explanation of shallow forgetting examined 10 with none. This absence of overlapping prior work within the limited search scope suggests the specific framing—linking Neural Collapse theory to continual learning's asymmetric forgetting—represents a novel analytical angle, though the search scale precludes definitive conclusions about the broader literature.

Based on top-29 semantic matches, the work appears to occupy a distinct theoretical position. The taxonomy structure confirms that analytical studies of forgetting mechanisms remain sparse compared to method-oriented research. However, the limited search scope means potentially relevant work in adjacent fields (e.g., neural collapse literature outside continual learning, or OOD detection theory) may not be fully captured. The novelty assessment reflects what was examined, not an exhaustive field survey.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
29
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: asymmetric effects of replay on feature space and classifier forgetting. The field of continual learning addresses how models can acquire new knowledge without catastrophically forgetting previous tasks. The taxonomy reveals a rich structure organized around several complementary themes. Replay Mechanisms and Buffer Management explores how stored examples are selected and maintained, with works like Replay Rehearsal Best Practices[2] and Adaptive Memory Replay Intrusion[6] examining storage strategies. Generative Replay and Synthetic Data investigates model-based approaches that synthesize past data using generative models, including diffusion-based methods such as Diffusion Distillation Replay[9] and Federated Diffusion Replay[49]. Prototype and Representation-Based Methods focuses on compact summaries and feature-level retention, while Classifier and Decision Boundary Adaptation tackles how decision surfaces evolve across tasks. Theoretical Foundations and Asymmetric Forgetting Analysis provides the analytical lens for understanding differential forgetting rates, and Domain-Specific Applications demonstrate these principles in specialized contexts like audio, hyperspectral imaging, and spatiotemporal prediction. A central tension runs through the literature between memory efficiency and retention fidelity: some approaches store raw examples while others rely on generative reconstruction or prototype compression. Recent work increasingly examines the interplay between representation learning and classifier adaptation, recognizing that forgetting manifests differently at these two levels. Shallow Deep Forgetting[0] sits squarely within the Theoretical Foundations branch, providing analytical insight into how replay asymmetrically mitigates forgetting in feature extractors versus classification heads. This contrasts with neighboring work like Brain Inspired Feature Exaggeration[35], which emphasizes biologically motivated mechanisms for enhancing feature discriminability. While many studies focus on empirical replay strategies or generative alternatives, Shallow Deep Forgetting[0] offers a principled decomposition of forgetting dynamics, helping to explain why replay's protective effects vary across network depth and informing more targeted mitigation strategies.

Claimed Contributions

Formalization of deep versus shallow forgetting and asymmetric replay effects

The authors formalize the distinction between deep forgetting (loss of feature-space separability) and shallow forgetting (classifier-level degradation). They empirically demonstrate that replay buffers mitigate these two forms of forgetting at fundamentally different rates, with small buffers sufficient for deep forgetting but large buffers required for shallow forgetting.

9 retrieved papers
Extension of Neural Collapse framework to continual learning

The authors extend the Neural Collapse theoretical framework to continual learning settings, including task-incremental, class-incremental, and domain-incremental learning. They characterize the asymptotic geometry of features and classifier heads under sequential training and prove that replay guarantees asymptotic separability in feature space.

10 retrieved papers
Mechanistic explanation of shallow forgetting via under-determined classifier optimization

The authors demonstrate that shallow forgetting arises because Neural Collapse causes buffer data to collapse into a low-dimensional subspace, creating rank-deficient covariances and inflated means. This renders classifier optimization under-determined, causing decision boundaries to misalign with true population boundaries despite preserved feature separability.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Formalization of deep versus shallow forgetting and asymmetric replay effects

The authors formalize the distinction between deep forgetting (loss of feature-space separability) and shallow forgetting (classifier-level degradation). They empirically demonstrate that replay buffers mitigate these two forms of forgetting at fundamentally different rates, with small buffers sufficient for deep forgetting but large buffers required for shallow forgetting.

Contribution

Extension of Neural Collapse framework to continual learning

The authors extend the Neural Collapse theoretical framework to continual learning settings, including task-incremental, class-incremental, and domain-incremental learning. They characterize the asymptotic geometry of features and classifier heads under sequential training and prove that replay guarantees asymptotic separability in feature space.

Contribution

Mechanistic explanation of shallow forgetting via under-determined classifier optimization

The authors demonstrate that shallow forgetting arises because Neural Collapse causes buffer data to collapse into a low-dimensional subspace, creating rank-deficient covariances and inflated means. This renders classifier optimization under-determined, causing decision boundaries to misalign with true population boundaries despite preserved feature separability.