Jet Expansions: Restructuring LLM Computation for Model Inspection

ICLR 2026 Conference SubmissionAnonymous Authors
transformerdecompositioninterpretabilityneural-symbolicn-gramsXAI
Abstract:

Large language models are becoming general knowledge engines for diverse applications. However, their computations are deeply entangled after training, resisting modularization which complicates interpretability, auditing, and long-term maintenance. We introduce Jet Expansions, a framework for expanding computational graphs using jet operators that generalize truncated Taylor series. Our method systematically decomposes language models into explicit input-to-output computational paths and complementary remainders. This functional decomposition provides a principled, knife-like operator for cutting through entanglement in LLMs, enabling scalable model inspection. We demonstrate how Jet Expansions ground and subsume the popular interpretability technique Logit Lens, reveal a (super-)exponential path structure with respect to recursive residual depth, and support several interpretability applications, including sketching a transformer language model with nn-gram statistics extracted from its computations and indexing model toxicity levels without curated benchmarks.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces Jet Expansions, a mathematical framework for decomposing language model computations into explicit input-to-output paths using generalized Taylor series operators. It resides in the 'Computational Path Formalization and Theory' leaf, which contains only two papers total. This is one of the sparsest research directions in the taxonomy, indicating a relatively underexplored theoretical niche focused on rigorous mathematical formalizations rather than empirical circuit discovery or application-driven interpretability.

The taxonomy reveals substantial activity in neighboring areas: Activation-Based Decomposition (five papers across two leaves) focuses on extracting features from hidden states, while Weight-Based and Circuit-Level Analysis (eight papers) emphasizes causal subgraph identification. Reasoning Path Decomposition (sixteen papers across three leaves) targets multi-step logic chains. Jet Expansions diverges by providing mathematical foundations for these empirical methods rather than proposing new feature extraction or circuit-tracing techniques. Its scope_note explicitly excludes empirical circuit discovery, positioning it as theoretical infrastructure.

Among twenty-eight candidates examined, the contribution-level analysis shows mixed novelty signals. The core Jet Expansions framework (ten candidates examined, zero refutations) and the function decomposition perspective (ten candidates, zero refutations) appear relatively novel within the limited search scope. However, the claim of grounding existing interpretability tools encountered one refutable candidate among eight examined, suggesting some theoretical overlap with prior formalization efforts. The search scale is modest—top-K semantic matches plus citations—so these findings reflect local rather than exhaustive coverage.

Given the sparse theoretical leaf and limited search scope, the work appears to occupy a distinct formal niche. The framework's mathematical rigor and higher-order expansion machinery differentiate it from variance-based methods like Neural-ANOVA, though the grounding of existing tools shows some precedent. The analysis covers approximately thirty semantically related papers, leaving open whether broader theoretical literature in adjacent fields might reveal additional connections.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
28
Contribution Candidate Papers Compared
1
Refutable Paper

Research Landscape Overview

Core task: decomposing language model computations into interpretable paths. The field has organized itself around several complementary perspectives on how to make neural network processing transparent. Activation-Based Decomposition and Feature Extraction focuses on identifying meaningful units within hidden representations, often using sparse autoencoders (Gated Sparse Autoencoders[4]) or feature circuits (Sparse Feature Circuits[1]) to isolate interpretable components. Weight-Based and Circuit-Level Analysis examines the connectivity and parameter structure that determines information flow, while Reasoning Path Decomposition and Explanation targets the step-by-step logic in multi-hop or chain-of-thought settings. Specialized Interpretability Applications adapt these techniques to domains like finance (Financial MI[7]) or knowledge graphs (KG-TRACES[8]), and Computational Path Formalization and Theory provides the mathematical underpinnings—such as decomposition algebras or probabilistic frameworks (Probabilistic Layer Decomposition[22])—that unify diverse methods. Architectural and Representational Foundations study how model design choices shape interpretability, and Cross-Domain and Emerging Applications explore novel settings from wireless networks (CoT Wireless[16]) to biological communication (Sperm Whale Vocalization[23]). A particularly active line of work centers on formalizing how computations can be rigorously partitioned into additive or multiplicative contributions, with methods like Neural-ANOVA[44] offering variance-based decompositions and Jet Expansions[0] introducing higher-order Taylor-like expansions to capture nonlinear interactions. These theoretical frameworks contrast with more empirical circuit-tracing approaches (Information Flow Routes[15], Task-Specific Circuits[26]) that identify which subnetworks are causally responsible for specific behaviors. Jet Expansions[0] sits squarely within the Computational Path Formalization branch, emphasizing rigorous mathematical decomposition rather than heuristic feature extraction. Compared to Neural-ANOVA[44], which partitions variance across input dimensions, Jet Expansions[0] extends the toolkit to higher-order terms, enabling finer-grained attribution of model outputs to interactions among features. This formal approach complements activation-based methods (Nonlinear Features[5]) by providing a principled basis for understanding how complex, nonlinear transformations emerge from simpler computational primitives.

Claimed Contributions

Jet Expansions framework for restructuring LLM computations

The authors propose a principled mathematical framework that uses jet operators (functional counterparts of truncated Taylor series) to systematically decompose language models into explicit input-to-output computational paths and complementary remainders. This functional decomposition provides a systematic operator for cutting through entanglement in LLMs, enabling scalable model inspection without requiring additional data or training.

10 retrieved papers
Treating interpretability as function decomposition

The authors introduce a conceptual shift in interpretability methodology by framing it as a problem of function decomposition rather than traditional data-driven approaches. This perspective enables manipulation of functions directly in function space, requiring no probe datasets or sampling, and allows arbitrary portions of computation to be isolated from the monolithic transformer.

10 retrieved papers
Theoretical grounding of existing interpretability tools

The authors establish a rigorous mathematical foundation using jet operators that subsumes and generalizes existing interpretability techniques like Logit Lens and path expansion methods. This framework provides formal justification for these tools and extends them to new instantiations such as extracting n-gram probability tables directly from LLMs without requiring corpus data.

8 retrieved papers
Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Jet Expansions framework for restructuring LLM computations

The authors propose a principled mathematical framework that uses jet operators (functional counterparts of truncated Taylor series) to systematically decompose language models into explicit input-to-output computational paths and complementary remainders. This functional decomposition provides a systematic operator for cutting through entanglement in LLMs, enabling scalable model inspection without requiring additional data or training.

Contribution

Treating interpretability as function decomposition

The authors introduce a conceptual shift in interpretability methodology by framing it as a problem of function decomposition rather than traditional data-driven approaches. This perspective enables manipulation of functions directly in function space, requiring no probe datasets or sampling, and allows arbitrary portions of computation to be isolated from the monolithic transformer.

Contribution

Theoretical grounding of existing interpretability tools

The authors establish a rigorous mathematical foundation using jet operators that subsumes and generalizes existing interpretability techniques like Logit Lens and path expansion methods. This framework provides formal justification for these tools and extends them to new instantiations such as extracting n-gram probability tables directly from LLMs without requiring corpus data.