Intrinsic training dynamics of deep neural networks

ICLR 2026 Conference SubmissionAnonymous Authors
gradient flowpath-liftingintrinsic lower dimensional dynamicconservation lawsimplicit bias
Abstract:

A fundamental challenge in the theory of deep learning is to understand whether gradient-based training can promote parameters belonging to certain lower-dimensional structures (e.g., sparse or low-rank sets), leading to so-called implicit bias. As a stepping stone, motivated by the proof structure of existing intrinsic bias analyses, we study when a gradient flow on a parameter θ\theta implies an intrinsic gradient flow on a ``lifted'' variable z=ϕ(θ)z = \phi(\theta), for an architecture-related function ϕ\phi. We express a so-called intrinsic dynamic property and show how it is related to the study of conservation laws associated with the factorization ϕ\phi. This leads to a simple criterion based on the inclusion of kernels of linear maps, which yields a necessary condition for this property to hold. We then apply our theory to general ReLU networks of arbitrary depth and show that, for a dense set of initializations, it is possible to rewrite the flow as an intrinsic dynamic in a lower dimension that depends only on zz and the initialization, when ϕ\phi is the so-called path-lifting. In the case of linear networks with ϕ\phi, the product of weight matrices, the intrinsic dynamic is known to hold under so-called balanced initializations; we generalize this to a broader class of {\em relaxed balanced} initializations, showing that, in certain configurations, these are the \emph{only} initializations that ensure the intrinsic metric property. Finally, for the linear neural ODE associated with the limit of infinitely deep linear networks, with relaxed balanced initialization, we explicit the corresponding intrinsic dynamics.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper investigates when gradient flow on network parameters induces an intrinsic gradient flow on a lifted variable, introducing a criterion based on conservation laws and kernel inclusions. It resides in the 'Intrinsic Dynamics and Conservation Laws' leaf, which contains only this single paper within the 50-paper taxonomy. This isolation suggests the specific focus on intrinsic dynamics via conservation laws and lifted parameterizations is relatively unexplored in the surveyed literature, occupying a sparse niche within the broader Training Dynamics and Trajectory Analysis branch.

The taxonomy reveals that neighboring leaves address related but distinct aspects of training dynamics. The 'Neural Tangent Kernel and Linearization Regimes' category examines lazy training where features remain fixed, while 'Feature Learning and Adaptive Regimes' studies nonlinear feature evolution. The 'Stability and Dynamical Properties' leaf characterizes perturbation effects and stable minima. This paper's focus on conservation laws and intrinsic geometric structure bridges these areas by providing a framework to understand when parameter-space dynamics can be rewritten in lower-dimensional lifted coordinates, complementing but not directly overlapping with kernel-regime or feature-learning analyses.

Among fourteen candidates examined, no contribution was clearly refuted by prior work. The first contribution on intrinsic dynamic properties examined one candidate with no refutation. The second contribution on ReLU networks via path-lifting examined three candidates, none refuting. The third contribution on relaxed balanced initializations for linear networks examined ten candidates, again with no refutations. This limited search scope—fourteen papers total—suggests the analysis captures closely related work but cannot claim exhaustive coverage of all potentially relevant prior art in implicit bias or conservation-law frameworks.

Based on the top-fourteen semantic matches and the taxonomy structure, the work appears to occupy a genuinely sparse research direction. The absence of sibling papers in its taxonomy leaf and the lack of refuting candidates among examined literature suggest novelty in its specific theoretical framework. However, the limited search scale means broader connections to implicit regularization, geometric optimization, or dynamical systems perspectives outside the examined set remain uncertain.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
14
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: gradient flow dynamics in overparameterized neural networks. The field is organized around several complementary perspectives on how and why gradient-based training succeeds in the overparameterized regime. Convergence Theory and Global Optimization investigates conditions under which gradient descent provably reaches global minima, often leveraging neural tangent kernel (NTK) approximations or analyzing the loss landscape's benign geometry (e.g., Gradient descent provably optimizes[1], Stochastic gradient descent optimizes[4]). Generalization and Learning Theory examines how overparameterization affects test performance, exploring implicit regularization and algorithm-dependent bounds (e.g., Generalization error bounds of[3], Algorithm-Dependent Generalization Bounds for[31]). Training Dynamics and Trajectory Analysis focuses on the evolution of parameters and features during training, uncovering conservation laws, phase transitions, and intrinsic geometric structures that govern the optimization path (e.g., The dynamics of gradient[2], Understanding the dynamics of[6]). Loss Landscape Geometry and Critical Points studies the topological and geometric properties of the loss surface, while Implicit Regularization and Bias investigates the inductive biases that gradient flow induces. Specialized Architectures and Problem Settings and Advanced Optimization Methods and Algorithms address domain-specific challenges and algorithmic refinements. A particularly active line of work examines the fine-grained trajectory structure and emergent phenomena during training, contrasting lazy (NTK-like) and feature-learning regimes. Some studies emphasize conservation laws and dynamical stability (e.g., Characterizing Dynamical Stability of[11], Dynamics and Perturbations of[15]), while others investigate how optimization trajectories exhibit hallmarks such as progressive sharpening or feature alignment (e.g., Hallmarks of Optimization Trajectories[41]). Intrinsic training dynamics of[0] sits within this trajectory-focused branch, specifically exploring intrinsic dynamics and conservation laws that govern parameter evolution. Its emphasis on uncovering invariant quantities and geometric constraints during training aligns closely with works like The dynamics of gradient[2] and Understanding the dynamics of[6], yet it distinguishes itself by probing deeper into the conservation principles that persist across different initialization and architecture choices, offering a complementary lens to studies that focus primarily on convergence rates or generalization bounds.

Claimed Contributions

Intrinsic dynamic property and its characterization via conservation laws

The authors introduce the intrinsic dynamic property (Definition 2.6) and establish its relationship to conservation laws. They provide a simple criterion based on kernel inclusion of linear maps (Theorem 2.14) that yields a necessary condition for this property to hold, connecting gradient flows in parameter space to intrinsic flows in lifted variable space.

1 retrieved paper
Intrinsic dynamics for general ReLU networks via path-lifting

The authors prove that for general ReLU networks of arbitrary depth using path-lifting reparametrization, the gradient flow can be rewritten as an intrinsic dynamic for a dense set of initializations (Theorem 3.1 and Corollary 3.2). This extends previous results limited to two-layer networks to arbitrary DAG architectures.

3 retrieved papers
Relaxed balanced initializations for linear networks

The authors introduce relaxed balanced initializations (Definition 3.4) as a generalization of balanced conditions for linear networks. They prove these initializations satisfy the intrinsic metric property (Theorem 3.6, Theorem 3.9) and show that in certain configurations, these are necessary and sufficient conditions (Theorem 3.7). They also provide explicit intrinsic dynamics for linear neural ODEs under these conditions (Theorem 3.11).

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Intrinsic dynamic property and its characterization via conservation laws

The authors introduce the intrinsic dynamic property (Definition 2.6) and establish its relationship to conservation laws. They provide a simple criterion based on kernel inclusion of linear maps (Theorem 2.14) that yields a necessary condition for this property to hold, connecting gradient flows in parameter space to intrinsic flows in lifted variable space.

Contribution

Intrinsic dynamics for general ReLU networks via path-lifting

The authors prove that for general ReLU networks of arbitrary depth using path-lifting reparametrization, the gradient flow can be rewritten as an intrinsic dynamic for a dense set of initializations (Theorem 3.1 and Corollary 3.2). This extends previous results limited to two-layer networks to arbitrary DAG architectures.

Contribution

Relaxed balanced initializations for linear networks

The authors introduce relaxed balanced initializations (Definition 3.4) as a generalization of balanced conditions for linear networks. They prove these initializations satisfy the intrinsic metric property (Theorem 3.6, Theorem 3.9) and show that in certain configurations, these are necessary and sufficient conditions (Theorem 3.7). They also provide explicit intrinsic dynamics for linear neural ODEs under these conditions (Theorem 3.11).