Towards the Three-Phase Dynamics of Generalization Power of a DNN

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

Generalization AnalysisLearning DynamicsDeep Learning Theory

This paper addresses the core challenge in the field of symbolic generalization, i.e., how to define, quantify, and track the dynamics of generalizable and non-generalizable interactions encoded by a DNN throughout the training process. Specifically, this work builds upon the recent theoretical achievement in explainable AI, which proves that the detailed inference patterns of DNNs can be strictly rewritten as a small number of AND-OR interaction patterns. Based on this, we propose an efficient method to quantify the generalization power of each interaction, and we discover a distinct three-phase dynamics of the generalization power of interactions during training. In particular, the early phase of training typically removes noisy and non-generalizable interactions and learns simple and generalizable interactions. The second and the third phases tend to capture increasingly complex interactions that are harder to generalize. Experimental results verify that the learning of non-generalizable interactions is the direct cause for the gap between the training and testing losses.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes an efficient method to quantify generalization power of individual interactions in DNNs and discovers a three-phase learning dynamic. It resides in the 'Training Dynamics and Temporal Evolution of Interactions' leaf, which contains five papers total including the original work. This leaf sits within the broader 'Interaction-Based Explanation and Generalization Theory' branch, indicating a moderately populated research direction focused specifically on temporal aspects of interaction learning. The taxonomy shows this is a specialized but active area, distinct from static interaction extraction or domain-specific applications.

The paper's leaf neighbors include works examining two-phase dynamics, symbolic interaction evolution, and layerwise knowledge propagation. The broader parent branch encompasses core interaction theory, generalization power quantification methods, and analysis of confusing samples. Adjacent top-level branches explore information-theoretic perspectives, feature selection techniques, and architectural generalization strategies. The taxonomy structure reveals that while interaction-based explanations form a coherent research thread, this work's focus on three-phase temporal dynamics positions it at the intersection of theoretical interaction frameworks and empirical training analysis, bridging static quantification methods with dynamic learning characterization.

Among thirty candidates examined through semantic search, none clearly refuted any of the three core contributions. For the quantification method, ten candidates were reviewed with zero refutable overlaps. The three-phase dynamics discovery similarly examined ten papers without finding prior work describing this specific temporal pattern. The causal link between non-generalizable interactions and loss gaps also showed no clear refutation across ten candidates. This suggests the specific combination of efficient quantification, three-phase characterization, and causal analysis represents a novel synthesis, though the limited search scope means potentially relevant work outside the top-thirty semantic matches may exist.

Based on the examined literature, the work appears to offer substantive contributions within its specialized research area. The taxonomy reveals a moderately crowded field of interaction-based generalization studies, but the specific three-phase temporal characterization distinguishes this from prior two-phase analyses. The analysis covers top-thirty semantic matches plus citation expansion, providing reasonable confidence in novelty claims while acknowledging that exhaustive coverage of all related training dynamics research remains beyond scope. The lack of refutable candidates across all contributions suggests meaningful differentiation from examined prior work.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Quantifying and tracking generalization power of interactions in deep neural networks. The field encompasses diverse perspectives on how neural networks learn and generalize through feature interactions. The taxonomy reveals several major branches: Interaction-Based Explanation and Generalization Theory examines how interactions evolve during training and contribute to model performance; Information-Theoretic and Probabilistic Perspectives formalize generalization via concepts like information bottlenecks and Bayesian frameworks; Feature Interaction Detection and Selection Methods develop techniques to identify and leverage important feature combinations; Domain-Specific Interaction Modeling applies these ideas to specialized tasks; Generalization Enhancement strategies propose architectural and training innovations; and Alternative Representation and Complexity Frameworks explore different mathematical lenses for understanding learning dynamics. Works like Generalizable Interaction Primitives[1] and Interactive Concepts[2] illustrate how researchers formalize interaction structures, while Generalization Mystery[6] and Generalized Information Bottleneck[4] reflect theoretical efforts to explain why networks generalize. A particularly active line of research focuses on the temporal evolution of interactions during training. Three-Phase Dynamics[0] sits within this cluster, examining how interaction patterns shift across distinct learning phases. This work closely relates to Two-Phase Dynamics[18] and Symbolic Interactions Dynamics[11], which similarly track how networks transition between different learning regimes. Nearby, Layerwise Knowledge Change[3] and Tracking Knowledge Layers[36] investigate how knowledge propagates through network depth over time. The central tension across these studies concerns whether interaction dynamics follow universal patterns or depend heavily on architecture and task. Three-Phase Dynamics[0] contributes by identifying a three-stage progression, contrasting with the two-phase characterization in related work and offering a more granular view of how generalizable interactions emerge and stabilize during optimization.

Claimed Contributions

Efficient method to quantify generalization power of individual interactions

10 retrieved papers

The authors introduce a method that quantifies the generalization power of each individual interaction encoded by a DNN by measuring its transferability to a baseline DNN trained on testing samples, avoiding computationally prohibitive exhaustive search across test samples.

10 retrieved papers

Discovery of three-phase dynamics of generalization power during training

10 retrieved papers

The authors identify and characterize a three-phase pattern in how the generalization power of interactions evolves throughout DNN training: early removal of non-generalizable interactions and learning of simple generalizable ones, followed by learning increasingly complex and less generalizable interactions, and finally learning predominantly non-generalizable interactions that cause overfitting.

10 retrieved papers

Causal link between non-generalizable interactions and training-testing loss gap

10 retrieved papers

The authors establish that non-generalizable interactions directly cause the gap between training and testing losses, demonstrating through experiments that removing these interactions significantly reduces this gap by primarily increasing training loss while minimally affecting testing loss.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[3] Layerwise change of knowledge in neural networks PDF

Cheng Xu, Cheng Lei, Xu Cheng, Lei Cheng, Xu Yang, Zhaoran Peng, Han Tian, Yang Xu, Zhang, Quanshi, Tian Han, Quanshi Zhang (2024)

[11] Towards the Dynamics of a DNN Learning Symbolic Interactions PDF

Dongrui-Liu, Qihan Ren, Yue Xin, Xu Yang, Junpeng Zhang, Quanshi Zhang (2024)

[18] Two-Phase Dynamics of Interactions Explains the Starting Point of a DNN Learning Over-Fitted Features PDF

Zhang Junpeng, Li Qing, Junpeng Zhang, Lin Liang, Qing Li, Zhang, Quanshi, Liang Lin, Quanshi Zhang (2024)

[36] Tracking the Change of Knowledge Through Layers in Neural Networks PDF

X Cheng, L Cheng, Z Peng, Q Zhang (0)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Efficient method to quantify generalization power of individual interactions

[59] How transferable are features in deep neural networks? PDF

Cannot Refute

[60] Exploring geo-transferability of deep neural network by developing comprehensive metrics PDF

Cannot Refute

[61] Paraphrasing Complex Network: Network Compression via Factor Transfer PDF

Cannot Refute

[62] Graphon Neural Networks and the Transferability of Graph Neural Networks PDF

Cannot Refute

[63] Transferability Properties of Graph Neural Networks PDF

Cannot Refute

[64] Transferability of coVariance Neural Networks PDF

Cannot Refute

[65] Transferability of spectral graph convolutional neural networks PDF

Cannot Refute

[66] Few-Shot Relation Extraction With Dual Graph Neural Network Interaction PDF

Cannot Refute

[67] Towards Graph Foundation Models: A Transferability Perspective PDF

Cannot Refute

[68] Transferable interactiveness knowledge for human-object interaction detection PDF

Cannot Refute

Contribution

Discovery of three-phase dynamics of generalization power during training

[49] Train longer, generalize better: closing the generalization gap in large batch training of neural networks PDF

Cannot Refute

[50] Mind the gap: Assessing temporal generalization in neural language models PDF

Cannot Refute

[51] Continuous temporal domain generalization PDF

Cannot Refute

[52] Temporal generalization estimation in evolving graphs PDF

Cannot Refute

[53] Scaling description of generalization with number of parameters in deep learning PDF

Cannot Refute

[54] Learning dynamics and generalization in deep reinforcement learning PDF

Cannot Refute

[55] Universal scaling laws of absorbing phase transitions in artificial deep neural networks PDF

Cannot Refute

[56] Dynamics of learning and generalization in neural networks PDF

Cannot Refute

[57] On the geometry of generalization and memorization in deep neural networks PDF

Cannot Refute

[58] Dynamics of Deep Neural Networks and Neural Tangent Hierarchy PDF

Cannot Refute

Contribution

Causal link between non-generalizable interactions and training-testing loss gap

[39] Spurious correlations in machine learning: A survey PDF

Cannot Refute

[40] An investigation of why overparameterization exacerbates spurious correlations PDF

Cannot Refute

[41] Increasing robustness to spurious correlations using forgettable examples PDF

Cannot Refute

[42] Learning from teaching regularization: Generalizable correlations should be easy to imitate PDF

Cannot Refute

[43] Are Vision Transformers Robust to Spurious Correlations? PDF

Cannot Refute

[44] Robustness to Spurious Correlation: A Comprehensive Review PDF

Cannot Refute

[45] Understanding the failure modes of out-of-distribution generalization PDF

Cannot Refute

[46] Robust learning with progressive data expansion against spurious correlation PDF

Cannot Refute

[47] The pitfalls of memorization: When memorization hurts generalization PDF

Cannot Refute

[48] Does invariant graph learning via environment augmentation learn invariance? PDF

Cannot Refute

Towards the Three-Phase Dynamics of Generalization Power of a DNN

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[3] Layerwise change of knowledge in neural networks PDF

[11] Towards the Dynamics of a DNN Learning Symbolic Interactions PDF

[18] Two-Phase Dynamics of Interactions Explains the Starting Point of a DNN Learning Over-Fitted Features PDF

[36] Tracking the Change of Knowledge Through Layers in Neural Networks PDF

Contribution Analysis

Efficient method to quantify generalization power of individual interactions

[59] How transferable are features in deep neural networks? PDF

[60] Exploring geo-transferability of deep neural network by developing comprehensive metrics PDF

[61] Paraphrasing Complex Network: Network Compression via Factor Transfer PDF

[62] Graphon Neural Networks and the Transferability of Graph Neural Networks PDF

[63] Transferability Properties of Graph Neural Networks PDF

[64] Transferability of coVariance Neural Networks PDF

[65] Transferability of spectral graph convolutional neural networks PDF

[66] Few-Shot Relation Extraction With Dual Graph Neural Network Interaction PDF

[67] Towards Graph Foundation Models: A Transferability Perspective PDF

[68] Transferable interactiveness knowledge for human-object interaction detection PDF

Discovery of three-phase dynamics of generalization power during training

[49] Train longer, generalize better: closing the generalization gap in large batch training of neural networks PDF

[50] Mind the gap: Assessing temporal generalization in neural language models PDF

[51] Continuous temporal domain generalization PDF

[52] Temporal generalization estimation in evolving graphs PDF

[53] Scaling description of generalization with number of parameters in deep learning PDF

[54] Learning dynamics and generalization in deep reinforcement learning PDF

[55] Universal scaling laws of absorbing phase transitions in artificial deep neural networks PDF

[56] Dynamics of learning and generalization in neural networks PDF

[57] On the geometry of generalization and memorization in deep neural networks PDF

[58] Dynamics of Deep Neural Networks and Neural Tangent Hierarchy PDF

Causal link between non-generalizable interactions and training-testing loss gap

[39] Spurious correlations in machine learning: A survey PDF

[40] An investigation of why overparameterization exacerbates spurious correlations PDF

[41] Increasing robustness to spurious correlations using forgettable examples PDF

[42] Learning from teaching regularization: Generalizable correlations should be easy to imitate PDF

[43] Are Vision Transformers Robust to Spurious Correlations? PDF

[44] Robustness to Spurious Correlation: A Comprehensive Review PDF

[45] Understanding the failure modes of out-of-distribution generalization PDF

[46] Robust learning with progressive data expansion against spurious correlation PDF

[47] The pitfalls of memorization: When memorization hurts generalization PDF

[48] Does invariant graph learning via environment augmentation learn invariance? PDF

Table of Contents