KDP: Simplifying Representation Dynamics in Kernel Space

ICLR 2026 Conference SubmissionAnonymous Authors
Large Language Models; Model Compression; Structured Pruning; Kernel Space
Abstract:

This paper proposes Kernelized Dynamics Pruning (KDP), a novel layer pruning method from the perspective of simplifying representation dynamics within large language models (LLMs). Motivated by the high similarity between consecutive layer representations, we view the LLM's forward pass as a discrete-time dynamical system. We speculate that this phenomenon indicates the model's internal dynamics have entered a ``slow manifold'', which exhibits computational redundancy. Based on this insight, we project the representations into a kernel space where the complex, non-linear transformation between them is simplified to an approximately linear one. Then, a simple network learns the inverse kernel transformation, thereby enabling the pruning of the entire layer block. Both theoretical analysis and extensive experiments validate the effectiveness of KDP, demonstrating its superiority over existing pruning baselines. Code is available at https://anonymous.4open.science/r/draft-123abc.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces Kernelized Dynamics Pruning (KDP), which frames layer pruning through a dynamical systems lens by projecting representations into kernel space to linearize transformations. Within the taxonomy, it occupies the 'Kernel Space and Dynamical Systems Perspectives' leaf under 'Representation Dynamics and Theoretical Foundations'. Notably, this leaf contains only the original paper itself—no sibling papers exist in this specific category. This positioning suggests the work explores a relatively sparse theoretical direction, distinct from the more populated empirical branches like 'Uniform and Block-Based Layer Removal' or 'Similarity-Based Layer Importance Metrics'.

The taxonomy reveals that neighboring leaves focus on empirical robustness studies ('Robustness and Stages of Inference') and knowledge localization ('Layer Functionality and Knowledge Localization'), while sibling branches address practical removal strategies and compensation mechanisms. The 'Representation Dynamics and Theoretical Foundations' parent branch itself is less crowded than 'Layer Removal Strategies', which contains multiple subtopics with numerous papers. KDP's kernel-based formulation diverges from activation-based or similarity-based importance metrics, instead offering a mathematical framework that connects to but does not directly overlap with empirical layer removal methods like ShortGPT or Slimming LLMs.

Across three contributions, the analysis examined 17 candidate papers total, with no clear refutations identified. The core KDP method examined 5 candidates with 0 refutable matches; the theoretical error bound examined 10 candidates with 0 refutations; and the geometric embedding reformulation examined 2 candidates with 0 refutations. Among the limited search scope of top-K semantic matches, no prior work appears to provide the same kernel-space linearization approach combined with learned inverse transformations for layer pruning. The theoretical contributions, particularly the error bound and geometric reformulation, show no overlapping prior work within the examined candidates.

Based on the limited literature search of 17 candidates, the work appears to occupy a novel theoretical niche within layer pruning research. The absence of sibling papers in its taxonomy leaf and the lack of refutable prior work among examined candidates suggest distinctiveness, though the search scope does not cover the entire field. The kernel-based dynamical systems perspective represents a less-explored angle compared to the more populated empirical and heuristic pruning branches.

Taxonomy

Core-task Taxonomy Papers
49
3
Claimed Contributions
17
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: Simplifying representation dynamics in large language models through layer pruning. The field has organized itself around several complementary perspectives. At the highest level, researchers distinguish between methods that analyze which layers matter (Layer Importance Analysis and Measurement), strategies for actually removing layers (Layer Removal Strategies), and theoretical work examining how representations evolve across depth (Representation Dynamics and Theoretical Foundations). Parallel branches address compensation mechanisms that restore performance after pruning, intra-layer pruning that targets weights or attention heads rather than entire layers, and task-specific approaches tailored to particular applications. Hybrid frameworks combine multiple pruning strategies, while other branches explore architectural variants and specialized contexts such as federated learning or constrained deployment. Representative works like Slimming LLMs[1] and ShortGPT Layer Redundancy[45] illustrate how layer removal strategies operate in practice, whereas Investigating Layer Importance[14] and Layer Importance Hallucination[4] exemplify efforts to measure and understand layer contributions. A central tension runs through the literature: some studies argue that deeper layers become less effective or even redundant (Deeper Layers Ineffectiveness[2]), while others find that high layers play critical roles in specific tasks (High Layer Attention[21]). Dynamic approaches like Dynamic Layerwise Pruning[3] and Dynamic Layer Selection[25] attempt to reconcile these views by adapting pruning decisions to input or task context. The original paper, KDP Kernel Space[0], sits within the theoretical foundations branch, offering a dynamical systems perspective on how layer transformations evolve. This contrasts with more empirical measurement studies like Investigating Layer Importance[14] and complements structural methods such as LLM Pruner[5], which focus on dependency-aware removal. By framing layer dynamics through kernel space analysis, KDP Kernel Space[0] provides a mathematical lens that bridges abstract representation theory with practical pruning decisions, addressing why certain layers can be simplified without degrading model behavior.

Claimed Contributions

Kernelized Dynamics Pruning (KDP) method

The authors introduce KDP, a layer pruning approach that projects LLM representations into a kernel space where complex non-linear transformations between layers are simplified to approximately linear ones, enabling entire layer blocks to be pruned while maintaining performance.

5 retrieved papers
Theoretical error bound for kernel linearization

The authors establish Theorem 1 providing an error bound for approximating multi-layer representations with linear transformations in kernel space, and Theorem 2 demonstrating that kernel space exhibits superior fitting capacity compared to the original representation space.

10 retrieved papers
Reformulation of layer pruning as geometric embedding search

The authors reframe the layer pruning problem as finding an optimal geometric viewpoint in a Reproducing Kernel Hilbert Space where the inherent simplicity of complex dynamics can be revealed, rather than merely constructing smaller sub-networks.

2 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Kernelized Dynamics Pruning (KDP) method

The authors introduce KDP, a layer pruning approach that projects LLM representations into a kernel space where complex non-linear transformations between layers are simplified to approximately linear ones, enabling entire layer blocks to be pruned while maintaining performance.

Contribution

Theoretical error bound for kernel linearization

The authors establish Theorem 1 providing an error bound for approximating multi-layer representations with linear transformations in kernel space, and Theorem 2 demonstrating that kernel space exhibits superior fitting capacity compared to the original representation space.

Contribution

Reformulation of layer pruning as geometric embedding search

The authors reframe the layer pruning problem as finding an optimal geometric viewpoint in a Reproducing Kernel Hilbert Space where the inherent simplicity of complex dynamics can be revealed, rather than merely constructing smaller sub-networks.