Jet Expansions: Restructuring LLM Computation for Model Inspection

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 7.0 Download Report PDF

transformerdecompositioninterpretabilityneural-symbolicn-gramsXAI

Large language models are becoming general knowledge engines for diverse applications. However, their computations are deeply entangled after training, resisting modularization which complicates interpretability, auditing, and long-term maintenance. We introduce Jet Expansions, a framework for expanding computational graphs using jet operators that generalize truncated Taylor series. Our method systematically decomposes language models into explicit input-to-output computational paths and complementary remainders. This functional decomposition provides a principled, knife-like operator for cutting through entanglement in LLMs, enabling scalable model inspection. We demonstrate how Jet Expansions ground and subsume the popular interpretability technique Logit Lens, reveal a (super-)exponential path structure with respect to recursive residual depth, and support several interpretability applications, including sketching a transformer language model with $n$ -gram statistics extracted from its computations and indexing model toxicity levels without curated benchmarks.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces Jet Expansions, a mathematical framework for decomposing language model computations into explicit input-to-output paths using generalized Taylor series operators. It resides in the 'Computational Path Formalization and Theory' leaf, which contains only two papers total. This is one of the sparsest research directions in the taxonomy, indicating a relatively underexplored theoretical niche focused on rigorous mathematical formalizations rather than empirical circuit discovery or application-driven interpretability.

The taxonomy reveals substantial activity in neighboring areas: Activation-Based Decomposition (five papers across two leaves) focuses on extracting features from hidden states, while Weight-Based and Circuit-Level Analysis (eight papers) emphasizes causal subgraph identification. Reasoning Path Decomposition (sixteen papers across three leaves) targets multi-step logic chains. Jet Expansions diverges by providing mathematical foundations for these empirical methods rather than proposing new feature extraction or circuit-tracing techniques. Its scope_note explicitly excludes empirical circuit discovery, positioning it as theoretical infrastructure.

Among twenty-eight candidates examined, the contribution-level analysis shows mixed novelty signals. The core Jet Expansions framework (ten candidates examined, zero refutations) and the function decomposition perspective (ten candidates, zero refutations) appear relatively novel within the limited search scope. However, the claim of grounding existing interpretability tools encountered one refutable candidate among eight examined, suggesting some theoretical overlap with prior formalization efforts. The search scale is modest—top-K semantic matches plus citations—so these findings reflect local rather than exhaustive coverage.

Given the sparse theoretical leaf and limited search scope, the work appears to occupy a distinct formal niche. The framework's mathematical rigor and higher-order expansion machinery differentiate it from variance-based methods like Neural-ANOVA, though the grounding of existing tools shows some precedent. The analysis covers approximately thirty semantically related papers, leaving open whether broader theoretical literature in adjacent fields might reveal additional connections.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: decomposing language model computations into interpretable paths. The field has organized itself around several complementary perspectives on how to make neural network processing transparent. Activation-Based Decomposition and Feature Extraction focuses on identifying meaningful units within hidden representations, often using sparse autoencoders (Gated Sparse Autoencoders[4]) or feature circuits (Sparse Feature Circuits[1]) to isolate interpretable components. Weight-Based and Circuit-Level Analysis examines the connectivity and parameter structure that determines information flow, while Reasoning Path Decomposition and Explanation targets the step-by-step logic in multi-hop or chain-of-thought settings. Specialized Interpretability Applications adapt these techniques to domains like finance (Financial MI[7]) or knowledge graphs (KG-TRACES[8]), and Computational Path Formalization and Theory provides the mathematical underpinnings—such as decomposition algebras or probabilistic frameworks (Probabilistic Layer Decomposition[22])—that unify diverse methods. Architectural and Representational Foundations study how model design choices shape interpretability, and Cross-Domain and Emerging Applications explore novel settings from wireless networks (CoT Wireless[16]) to biological communication (Sperm Whale Vocalization[23]). A particularly active line of work centers on formalizing how computations can be rigorously partitioned into additive or multiplicative contributions, with methods like Neural-ANOVA[44] offering variance-based decompositions and Jet Expansions[0] introducing higher-order Taylor-like expansions to capture nonlinear interactions. These theoretical frameworks contrast with more empirical circuit-tracing approaches (Information Flow Routes[15], Task-Specific Circuits[26]) that identify which subnetworks are causally responsible for specific behaviors. Jet Expansions[0] sits squarely within the Computational Path Formalization branch, emphasizing rigorous mathematical decomposition rather than heuristic feature extraction. Compared to Neural-ANOVA[44], which partitions variance across input dimensions, Jet Expansions[0] extends the toolkit to higher-order terms, enabling finer-grained attribution of model outputs to interactions among features. This formal approach complements activation-based methods (Nonlinear Features[5]) by providing a principled basis for understanding how complex, nonlinear transformations emerge from simpler computational primitives.

Claimed Contributions

Jet Expansions framework for restructuring LLM computations

10 retrieved papers

The authors propose a principled mathematical framework that uses jet operators (functional counterparts of truncated Taylor series) to systematically decompose language models into explicit input-to-output computational paths and complementary remainders. This functional decomposition provides a systematic operator for cutting through entanglement in LLMs, enabling scalable model inspection without requiring additional data or training.

10 retrieved papers

Treating interpretability as function decomposition

10 retrieved papers

The authors introduce a conceptual shift in interpretability methodology by framing it as a problem of function decomposition rather than traditional data-driven approaches. This perspective enables manipulation of functions directly in function space, requiring no probe datasets or sampling, and allows arbitrary portions of computation to be isolated from the monolithic transformer.

10 retrieved papers

Theoretical grounding of existing interpretability tools

Can Refute

8 retrieved papers

The authors establish a rigorous mathematical foundation using jet operators that subsumes and generalizes existing interpretability techniques like Logit Lens and path expansion methods. This framework provides formal justification for these tools and extends them to new instantiations such as extracting n-gram probability tables directly from LLMs without requiring corpus data.

8 retrieved papers

Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[44] Neural-ANOVA: Model Decomposition for Interpretable Machine Learning PDF

Limmer, Steffen, Udluft, Steffen Limmer, Otte, Clemens, Steffen Udluft, Clemens Otte (2024)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Jet Expansions framework for restructuring LLM computations

[59] Unfolding Videos Dynamics via Taylor Expansion PDF

Cannot Refute

[60] M-Rule: An Enhanced Deep Taylor Decomposition for Multi-model Interpretability PDF

Cannot Refute

[61] Hope: High-order polynomial expansion of black-box neural networks PDF

Cannot Refute

[62] GTEA: Guided Taylor Expansion Approximation Network for Optical Flow Estimation PDF

Cannot Refute

[63] Towards explaining anomalies: A deep Taylor decomposition of one-class models PDF

Cannot Refute

[64] Cat: Interpretable concept-based taylor additive models PDF

Cannot Refute

[65] Explaining COVID-19 diagnosis with Taylor decompositions PDF

Cannot Refute

[66] An integrated model based on feedforward neural network and Taylor expansion for indicator correlation elimination PDF

Cannot Refute

[67] Explaining nonlinear classification decisions with deep taylor decomposition PDF

Cannot Refute

[68] Tayloraecnet: A Taylor Style Neural Network For Full-Band Echo Cancellation PDF

Cannot Refute

Contribution

Treating interpretability as function decomposition

[69] Tensorization of neural networks for improved privacy and interpretability PDF

Cannot Refute

[70] A survey on kolmogorov-arnold network PDF

Cannot Refute

[71] Multilevel wavelet decomposition network for interpretable time series analysis PDF

Cannot Refute

[72] Kolmogorov-Arnold Networks for Interpretable and Efficient Function Approximation PDF

Cannot Refute

[73] A comprehensive survey on self-interpretable neural networks PDF

Cannot Refute

[74] Beyond the Black Box: A Review of Quantitative Metrics for Neural Network Interpretability and Their Practical Implications PDF

Cannot Refute

[75] Neural additive models: Interpretable machine learning with neural nets PDF

Cannot Refute

[76] Interpretable basis decomposition for visual explanation PDF

Cannot Refute

[77] Tensor Product Neural Networks for Functional ANOVA Model PDF

Cannot Refute

[78] Neural basis models for interpretability PDF

Cannot Refute

Contribution

Theoretical grounding of existing interpretability tools

[52] Jet expansions of residual computation PDF

Can Refute

[51] Towards unifying interpretability and control: Evaluation via intervention PDF

Cannot Refute

[53] Small Vectors, Big Effects: A Mechanistic Study of RL-Induced Reasoning via Steering Vectors PDF

Cannot Refute

[54] Mechanistic Interpretability in the Presence of Architectural Obfuscation PDF

Cannot Refute

[55] Metropolis-Hasting based Expanded Path Size Logit model for cyclists' route choice using GPS data PDF

Cannot Refute

[56] nnterp: A Standardized Interface for Mechanistic Interpretability of Transformers PDF

Cannot Refute

[57] Closed-Form Bayesian Inferences for the Logit Model via Polynomial Expansions PDF

Cannot Refute

[58] A framework for the interpretation of first-order interaction in logit modeling. PDF

Cannot Refute

Jet Expansions: Restructuring LLM Computation for Model Inspection

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[44] Neural-ANOVA: Model Decomposition for Interpretable Machine Learning PDF

Contribution Analysis

Jet Expansions framework for restructuring LLM computations

[59] Unfolding Videos Dynamics via Taylor Expansion PDF

[60] M-Rule: An Enhanced Deep Taylor Decomposition for Multi-model Interpretability PDF

[61] Hope: High-order polynomial expansion of black-box neural networks PDF

[62] GTEA: Guided Taylor Expansion Approximation Network for Optical Flow Estimation PDF

[63] Towards explaining anomalies: A deep Taylor decomposition of one-class models PDF

[64] Cat: Interpretable concept-based taylor additive models PDF

[65] Explaining COVID-19 diagnosis with Taylor decompositions PDF

[66] An integrated model based on feedforward neural network and Taylor expansion for indicator correlation elimination PDF

[67] Explaining nonlinear classification decisions with deep taylor decomposition PDF

[68] Tayloraecnet: A Taylor Style Neural Network For Full-Band Echo Cancellation PDF

Treating interpretability as function decomposition

[69] Tensorization of neural networks for improved privacy and interpretability PDF

[70] A survey on kolmogorov-arnold network PDF

[71] Multilevel wavelet decomposition network for interpretable time series analysis PDF

[72] Kolmogorov-Arnold Networks for Interpretable and Efficient Function Approximation PDF

[73] A comprehensive survey on self-interpretable neural networks PDF

[74] Beyond the Black Box: A Review of Quantitative Metrics for Neural Network Interpretability and Their Practical Implications PDF

[75] Neural additive models: Interpretable machine learning with neural nets PDF

[76] Interpretable basis decomposition for visual explanation PDF

[77] Tensor Product Neural Networks for Functional ANOVA Model PDF

[78] Neural basis models for interpretability PDF

Theoretical grounding of existing interpretability tools

[52] Jet expansions of residual computation PDF

[51] Towards unifying interpretability and control: Evaluation via intervention PDF

[53] Small Vectors, Big Effects: A Mechanistic Study of RL-Induced Reasoning via Steering Vectors PDF

[54] Mechanistic Interpretability in the Presence of Architectural Obfuscation PDF

[55] Metropolis-Hasting based Expanded Path Size Logit model for cyclists' route choice using GPS data PDF

[56] nnterp: A Standardized Interface for Mechanistic Interpretability of Transformers PDF

[57] Closed-Form Bayesian Inferences for the Logit Model via Polynomial Expansions PDF

[58] A framework for the interpretation of first-order interaction in logit modeling. PDF

Table of Contents