Saddle-to-Saddle Dynamics Explains A Simplicity Bias Across Architectures

ICLR 2026 Conference SubmissionAnonymous Authors
learning dynamicsgradient flowsimplicity bias
Abstract:

Neural networks trained with gradient descent often learn solutions of increasing complexity over time, a phenomenon known as simplicity bias. Despite being widely observed across architectures, existing theoretical treatments lack a unifying framework. We present a theoretical framework that explains a simplicity bias arising from saddle-to-saddle learning dynamics for a general class of neural networks, incorporating fully-connected, convolutional, and attention-based architectures. Here, simple means expressible with few hidden units, i.e., hidden neurons, convolutional kernels, or attention heads. Specifically, we show that linear networks learn solutions of increasing rank, ReLU networks learn solutions with an increasing number of kinks, convolutional networks learn solutions with an increasing number of convolutional kernels, and self-attention models learn solutions with an increasing number of attention heads. By analyzing fixed points, invariant manifolds, and dynamics of gradient descent learning, we show that saddle-to-saddle dynamics operates by iteratively evolving near an invariant manifold, approaching a saddle, and switching to another invariant manifold. Our analysis also illuminates the effects of data distribution and initialization on the duration and number of plateaus in learning, dissociating previously confounding factors. Overall, our theory offers a framework for understanding when and why gradient descent progressively learns increasingly complex solutions.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper presents a unified theoretical framework explaining simplicity bias through saddle-to-saddle dynamics across multiple architectures (fully-connected, convolutional, attention-based). It resides in the 'Gradient Flow Dynamics and Convergence Analysis' leaf, which contains five papers total, making this a moderately populated research direction within the broader theoretical characterization branch. The work addresses a core gap: existing treatments lack unifying principles across architectures, whereas this framework shows how different network types (linear, ReLU, convolutional, self-attention) exhibit increasing complexity through architecture-specific mechanisms (rank, kinks, kernels, attention heads).

The taxonomy reveals several neighboring research directions that contextualize this contribution. The sibling leaf 'Implicit Bias and Inductive Bias Characterization' (seven papers) explores related biases toward low-rank or structured solutions but focuses less on dynamical mechanisms. The 'Frequency and Spectral Perspectives' leaf (four papers) analyzes simplicity through frequency domain properties rather than saddle dynamics. Within 'Architecture-Specific Analyses', separate leaves examine transformers and RNNs individually, whereas this work provides cross-architecture unification. The framework bridges theoretical dynamics (its home leaf) with architecture-specific manifestations (a separate branch), positioning it at an intersection of two major research threads.

Among twenty-two candidates examined, the contribution-level analysis shows mixed novelty signals. The unified saddle-to-saddle framework (Contribution 1) examined five candidates with zero refutations, suggesting relative novelty in providing cross-architecture unification. However, the characterization of embedded fixed points and invariant manifolds (Contribution 2) examined seven candidates and found two refutable overlaps, indicating more substantial prior work on these mathematical structures. The architecture-specific timescale separation mechanisms (Contribution 3) examined ten candidates without refutations, though the larger candidate pool suggests this area has received more research attention. The limited search scope (twenty-two papers, not hundreds) means these assessments reflect top semantic matches rather than exhaustive coverage.

Based on the available signals, the work appears to offer meaningful theoretical synthesis by unifying previously disparate architecture-specific analyses under a common dynamical framework. The taxonomy structure shows this sits in a moderately active area (five sibling papers) rather than a sparse frontier, and the contribution-level statistics suggest the unification aspect (Contribution 1) may be more novel than the underlying mathematical tools (Contribution 2). The analysis covers top-ranked semantic matches and does not claim comprehensive field coverage, so additional related work may exist beyond the examined candidates.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
22
Contribution Candidate Papers Compared
2
Refutable Paper

Research Landscape Overview

Core task: Simplicity bias in neural network learning dynamics. The field examines how neural networks preferentially learn simpler patterns or representations during training, even when more complex solutions exist. The taxonomy organizes this research into six main branches. Theoretical Characterization explores mathematical foundations through gradient flow dynamics and convergence analyses, often leveraging tools from optimization theory to explain why networks favor low-complexity solutions (e.g., Margin Maximization Simplicity[26], Implicit Bias Linear[43]). Empirical Studies document observed simplicity phenomena across diverse settings, while Architecture-Specific Analyses investigate how particular network designs—such as transformers (Distributional Simplicity Transformers[4]) or ReLU networks (Optimization Threshold ReLU[5])—exhibit distinct biases. Complexity and Spurious Correlation Dynamics examines the interplay between learning simple features and relying on spurious correlations (Spurious Correlations Dynamics[17], Early Spurious Biases[28]). Mitigation Strategies propose interventions to counteract harmful simplicity biases, and Visualization and Analysis Tools provide methods for diagnosing these phenomena in practice. Several active lines of work reveal key trade-offs and open questions. One central theme contrasts the benefits of simplicity bias for generalization against its pitfalls when simple features are misleading (Pitfalls Simplicity Bias[3]). Another explores how early learning dynamics (Early Learning Dynamics[9]) shape which features networks prioritize, with some studies showing that networks lock onto simple patterns before exploring complex ones. Saddle Dynamics Simplicity[0] sits within the Theoretical Characterization branch, specifically addressing gradient flow dynamics and convergence. It shares thematic ground with Saddle Dynamics Architectures[36], which also examines saddle-point behavior across different network designs, and complements Simplicity Beyond Separable[1], which extends theoretical understanding to non-separable settings. By analyzing how saddle points influence the trajectory toward simpler solutions, this work contributes to the foundational understanding of why and when simplicity bias emerges during optimization.

Claimed Contributions

Unified theoretical framework for saddle-to-saddle dynamics across architectures

The authors develop a unified framework explaining how saddle-to-saddle dynamics produces simplicity bias across multiple neural network architectures (fully-connected, convolutional, attention-based). The framework shows that networks progressively learn solutions of increasing complexity by recruiting additional effective units (neurons, kernels, or attention heads) through iterative transitions between saddle points connected by invariant manifolds.

5 retrieved papers
Characterization of embedded fixed points and invariant manifolds

The authors establish that fixed points of narrower networks become saddle points in wider networks, creating a recursive embedding structure. They further prove that these saddles are connected by invariant manifolds along which wider networks behave like narrower ones, preserving simplicity along connecting trajectories.

7 retrieved papers
Can Refute
Architecture-specific mechanisms for timescale separation

The authors identify two distinct mechanisms driving saddle-to-saddle dynamics: timescale separation between directions (for linear architectures, due to data distribution) and timescale separation between units (for quadratic architectures like self-attention, due to initialization). These mechanisms explain how different architectures progressively learn increasingly complex solutions with architecture-specific notions of simplicity.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Unified theoretical framework for saddle-to-saddle dynamics across architectures

The authors develop a unified framework explaining how saddle-to-saddle dynamics produces simplicity bias across multiple neural network architectures (fully-connected, convolutional, attention-based). The framework shows that networks progressively learn solutions of increasing complexity by recruiting additional effective units (neurons, kernels, or attention heads) through iterative transitions between saddle points connected by invariant manifolds.

Contribution

Characterization of embedded fixed points and invariant manifolds

The authors establish that fixed points of narrower networks become saddle points in wider networks, creating a recursive embedding structure. They further prove that these saddles are connected by invariant manifolds along which wider networks behave like narrower ones, preserving simplicity along connecting trajectories.

Contribution

Architecture-specific mechanisms for timescale separation

The authors identify two distinct mechanisms driving saddle-to-saddle dynamics: timescale separation between directions (for linear architectures, due to data distribution) and timescale separation between units (for quadratic architectures like self-attention, due to initialization). These mechanisms explain how different architectures progressively learn increasingly complex solutions with architecture-specific notions of simplicity.