Intrinsic training dynamics of deep neural networks

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

gradient flowpath-liftingintrinsic lower dimensional dynamicconservation lawsimplicit bias

A fundamental challenge in the theory of deep learning is to understand whether gradient-based training can promote parameters belonging to certain lower-dimensional structures (e.g., sparse or low-rank sets), leading to so-called implicit bias. As a stepping stone, motivated by the proof structure of existing intrinsic bias analyses, we study when a gradient flow on a parameter $\theta$ implies an intrinsic gradient flow on a ``lifted'' variable $z = \phi(\theta)$ , for an architecture-related function $\phi$ . We express a so-called intrinsic dynamic property and show how it is related to the study of conservation laws associated with the factorization $\phi$ . This leads to a simple criterion based on the inclusion of kernels of linear maps, which yields a necessary condition for this property to hold. We then apply our theory to general ReLU networks of arbitrary depth and show that, for a dense set of initializations, it is possible to rewrite the flow as an intrinsic dynamic in a lower dimension that depends only on $z$ and the initialization, when $\phi$ is the so-called path-lifting. In the case of linear networks with $\phi$ , the product of weight matrices, the intrinsic dynamic is known to hold under so-called balanced initializations; we generalize this to a broader class of {\em relaxed balanced} initializations, showing that, in certain configurations, these are the \emph{only} initializations that ensure the intrinsic metric property. Finally, for the linear neural ODE associated with the limit of infinitely deep linear networks, with relaxed balanced initialization, we explicit the corresponding intrinsic dynamics.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper investigates when gradient flow on network parameters induces an intrinsic gradient flow on a lifted variable, introducing a criterion based on conservation laws and kernel inclusions. It resides in the 'Intrinsic Dynamics and Conservation Laws' leaf, which contains only this single paper within the 50-paper taxonomy. This isolation suggests the specific focus on intrinsic dynamics via conservation laws and lifted parameterizations is relatively unexplored in the surveyed literature, occupying a sparse niche within the broader Training Dynamics and Trajectory Analysis branch.

The taxonomy reveals that neighboring leaves address related but distinct aspects of training dynamics. The 'Neural Tangent Kernel and Linearization Regimes' category examines lazy training where features remain fixed, while 'Feature Learning and Adaptive Regimes' studies nonlinear feature evolution. The 'Stability and Dynamical Properties' leaf characterizes perturbation effects and stable minima. This paper's focus on conservation laws and intrinsic geometric structure bridges these areas by providing a framework to understand when parameter-space dynamics can be rewritten in lower-dimensional lifted coordinates, complementing but not directly overlapping with kernel-regime or feature-learning analyses.

Among fourteen candidates examined, no contribution was clearly refuted by prior work. The first contribution on intrinsic dynamic properties examined one candidate with no refutation. The second contribution on ReLU networks via path-lifting examined three candidates, none refuting. The third contribution on relaxed balanced initializations for linear networks examined ten candidates, again with no refutations. This limited search scope—fourteen papers total—suggests the analysis captures closely related work but cannot claim exhaustive coverage of all potentially relevant prior art in implicit bias or conservation-law frameworks.

Based on the top-fourteen semantic matches and the taxonomy structure, the work appears to occupy a genuinely sparse research direction. The absence of sibling papers in its taxonomy leaf and the lack of refuting candidates among examined literature suggest novelty in its specific theoretical framework. However, the limited search scale means broader connections to implicit regularization, geometric optimization, or dynamical systems perspectives outside the examined set remain uncertain.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: gradient flow dynamics in overparameterized neural networks. The field is organized around several complementary perspectives on how and why gradient-based training succeeds in the overparameterized regime. Convergence Theory and Global Optimization investigates conditions under which gradient descent provably reaches global minima, often leveraging neural tangent kernel (NTK) approximations or analyzing the loss landscape's benign geometry (e.g., Gradient descent provably optimizes[1], Stochastic gradient descent optimizes[4]). Generalization and Learning Theory examines how overparameterization affects test performance, exploring implicit regularization and algorithm-dependent bounds (e.g., Generalization error bounds of[3], Algorithm-Dependent Generalization Bounds for[31]). Training Dynamics and Trajectory Analysis focuses on the evolution of parameters and features during training, uncovering conservation laws, phase transitions, and intrinsic geometric structures that govern the optimization path (e.g., The dynamics of gradient[2], Understanding the dynamics of[6]). Loss Landscape Geometry and Critical Points studies the topological and geometric properties of the loss surface, while Implicit Regularization and Bias investigates the inductive biases that gradient flow induces. Specialized Architectures and Problem Settings and Advanced Optimization Methods and Algorithms address domain-specific challenges and algorithmic refinements. A particularly active line of work examines the fine-grained trajectory structure and emergent phenomena during training, contrasting lazy (NTK-like) and feature-learning regimes. Some studies emphasize conservation laws and dynamical stability (e.g., Characterizing Dynamical Stability of[11], Dynamics and Perturbations of[15]), while others investigate how optimization trajectories exhibit hallmarks such as progressive sharpening or feature alignment (e.g., Hallmarks of Optimization Trajectories[41]). Intrinsic training dynamics of[0] sits within this trajectory-focused branch, specifically exploring intrinsic dynamics and conservation laws that govern parameter evolution. Its emphasis on uncovering invariant quantities and geometric constraints during training aligns closely with works like The dynamics of gradient[2] and Understanding the dynamics of[6], yet it distinguishes itself by probing deeper into the conservation principles that persist across different initialization and architecture choices, offering a complementary lens to studies that focus primarily on convergence rates or generalization bounds.

Claimed Contributions

Intrinsic dynamic property and its characterization via conservation laws

1 retrieved paper

The authors introduce the intrinsic dynamic property (Definition 2.6) and establish its relationship to conservation laws. They provide a simple criterion based on kernel inclusion of linear maps (Theorem 2.14) that yields a necessary condition for this property to hold, connecting gradient flows in parameter space to intrinsic flows in lifted variable space.

1 retrieved paper

Intrinsic dynamics for general ReLU networks via path-lifting

3 retrieved papers

The authors prove that for general ReLU networks of arbitrary depth using path-lifting reparametrization, the gradient flow can be rewritten as an intrinsic dynamic for a dense set of initializations (Theorem 3.1 and Corollary 3.2). This extends previous results limited to two-layer networks to arbitrary DAG architectures.

3 retrieved papers

Relaxed balanced initializations for linear networks

10 retrieved papers

The authors introduce relaxed balanced initializations (Definition 3.4) as a generalization of balanced conditions for linear networks. They prove these initializations satisfy the intrinsic metric property (Theorem 3.6, Theorem 3.9) and show that in certain configurations, these are necessary and sufficient conditions (Theorem 3.7). They also provide explicit intrinsic dynamics for linear neural ODEs under these conditions (Theorem 3.11).

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Intrinsic dynamic property and its characterization via conservation laws

[51] A unifying approach to self-organizing systems interacting via conservation laws PDF

Cannot Refute

Contribution

Intrinsic dynamics for general ReLU networks via path-lifting

[60] Harnessing symmetries for modern deep learning challenges: a path-lifting perspective PDF

Cannot Refute

[61] Convexity in ReLU Neural Networks: Beyond ICNNs? PDF

Cannot Refute

[62] Rethinking Firm Behavior When Financial Markets Are Incomplete: A General Equilibrium Model Enhanced by Artificial PDF

Cannot Refute

Contribution

Relaxed balanced initializations for linear networks

[6] Understanding the dynamics of gradient flow in overparameterized linear models PDF

Cannot Refute

[8] Convergence analysis and trajectory comparison of gradient descent for overparameterized deep linear networks PDF

Cannot Refute

[52] Linear convergence of gradient descent for finite width over-parametrized linear networks with general initialization PDF

Cannot Refute

[53] Saddle-to-saddle dynamics in diagonal linear networks PDF

Cannot Refute

[54] Learning dynamics of deep matrix factorization beyond the edge of stability PDF

Cannot Refute

[55] Mixed dynamics in linear networks: Unifying the lazy and active regimes PDF

Cannot Refute

[56] Implicit bias of (stochastic) gradient descent for rank-1 linear neural network PDF

Cannot Refute

[57] Exact solutions to the nonlinear dynamics of learning in deep linear neural networks PDF

Cannot Refute

[58] On the explicit role of initialization on the convergence and implicit bias of overparametrized linear networks PDF

Cannot Refute

[59] On non-local convergence analysis of deep linear networks PDF

Cannot Refute

Intrinsic training dynamics of deep neural networks

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

Contribution Analysis

Intrinsic dynamic property and its characterization via conservation laws

[51] A unifying approach to self-organizing systems interacting via conservation laws PDF

Intrinsic dynamics for general ReLU networks via path-lifting

[60] Harnessing symmetries for modern deep learning challenges: a path-lifting perspective PDF

[61] Convexity in ReLU Neural Networks: Beyond ICNNs? PDF

[62] Rethinking Firm Behavior When Financial Markets Are Incomplete: A General Equilibrium Model Enhanced by Artificial PDF

Relaxed balanced initializations for linear networks

[6] Understanding the dynamics of gradient flow in overparameterized linear models PDF

[8] Convergence analysis and trajectory comparison of gradient descent for overparameterized deep linear networks PDF

[52] Linear convergence of gradient descent for finite width over-parametrized linear networks with general initialization PDF

[53] Saddle-to-saddle dynamics in diagonal linear networks PDF

[54] Learning dynamics of deep matrix factorization beyond the edge of stability PDF

[55] Mixed dynamics in linear networks: Unifying the lazy and active regimes PDF

[56] Implicit bias of (stochastic) gradient descent for rank-1 linear neural network PDF

[57] Exact solutions to the nonlinear dynamics of learning in deep linear neural networks PDF

[58] On the explicit role of initialization on the convergence and implicit bias of overparametrized linear networks PDF

[59] On non-local convergence analysis of deep linear networks PDF

Table of Contents