Structural Inference: Interpreting Small Language Models with Susceptibilities
Overview
Overall Novelty Assessment
The paper introduces a linear response framework that treats neural networks as Bayesian statistical mechanical systems, deriving susceptibilities to data distribution perturbations and using these to identify functional modules. Within the taxonomy, it occupies the 'Bayesian and Statistical Mechanical Interpretability Methods' leaf under 'Linear Response Theory and Statistical Physics Frameworks'. Notably, this leaf contains only the original paper itself—no sibling papers are present—indicating this represents a relatively sparse and novel research direction within the broader field of neural network interpretability through perturbation analysis.
The taxonomy reveals that neighboring work exists primarily in two directions: neuronal network models applying linear response to biological systems (two papers in 'Neuronal Network Linear Response Models') and geometric analyses of layer representations without explicit statistical mechanical framing (four papers across geometric subtopics). The original paper bridges these areas by applying physics-inspired linear response theory specifically to artificial neural networks for interpretability, rather than modeling biological neurons or analyzing static geometric properties. This positioning suggests the work synthesizes concepts from adjacent branches—statistical physics formalism and interpretability goals—in a combination not extensively explored by prior literature.
Among the 30 candidates examined across three contributions, none were identified as clearly refuting any claimed novelty. The 'Susceptibility framework' contribution examined 10 candidates with zero refutable matches, as did the 'Structural inference methodology' and 'Per-token attribution scores' contributions. This absence of overlapping prior work across all contributions, combined with the limited search scope, suggests that within the examined literature the specific combination of Bayesian statistical mechanics, susceptibility-based attribution, and modular structure inference appears relatively unexplored. However, the search examined only top-30 semantic matches, leaving open the possibility of relevant work outside this scope.
Given the limited search scale and the paper's placement in an otherwise unpopulated taxonomy leaf, the work appears to occupy a genuinely novel intersection of statistical physics and neural network interpretability. The absence of sibling papers and zero refutable candidates across 30 examined works supports this impression, though the analysis cannot rule out relevant prior work beyond the top-K semantic neighborhood. The framework's distinctiveness lies in its integrated approach—combining Bayesian mechanics, local sampling, and low-rank factorization—rather than any single component in isolation.
Taxonomy
Research Landscape Overview
Claimed Contributions
The authors develop a linear response framework rooted in statistical physics and Bayesian learning theory that treats neural networks as statistical mechanical systems. Susceptibilities measure how infinitesimal perturbations to the data distribution induce first-order changes in the expected behavior of network components, providing a principled link between data structure and model internals.
The authors present a methodology that uses response matrices of susceptibilities combined with PCA to identify functional modules and internal structure in neural networks. This approach reveals how models balance expression and suppression of patterns, enabling discovery of circuits like induction heads through data-driven analysis.
The authors show that susceptibilities can be decomposed into per-token contributions with interpretable signs (positive for suppression, negative for expression). These token-level attribution scores can be efficiently estimated using local Stochastic Gradient Langevin Dynamics sampling around network checkpoints.
Core Task Comparisons
Comparisons with papers in the same taxonomy category
Contribution Analysis
Detailed comparisons for each claimed contribution
Susceptibility framework for neural network interpretability
The authors develop a linear response framework rooted in statistical physics and Bayesian learning theory that treats neural networks as statistical mechanical systems. Susceptibilities measure how infinitesimal perturbations to the data distribution induce first-order changes in the expected behavior of network components, providing a principled link between data structure and model internals.
[38] Diagnosing model performance under distribution shift PDF
[39] Measuring robustness to natural distribution shifts in image classification PDF
[40] Measuring domain shift for deep learning in histopathology PDF
[41] Adaptive State Estimation and Continual Learning under Data Distribution Shift PDF
[42] Dish-ts: a general paradigm for alleviating distribution shift in time series forecasting PDF
[43] Intermediate layer classifiers for ood generalization PDF
[44] Ensemble-based deep reinforcement learning for vehicle routing problems under distribution shift PDF
[45] Agreement-on-the-line: Predicting the performance of neural networks under distribution shift PDF
[46] Prediction Accuracy & Reliability: Classification and Object Localization Under Distribution Shift PDF
[47] Incomplete Multisource Domain Adaptation for Fault Diagnosis of Blast Furnace PDF
Structural inference methodology
The authors present a methodology that uses response matrices of susceptibilities combined with PCA to identify functional modules and internal structure in neural networks. This approach reveals how models balance expression and suppression of patterns, enabling discovery of circuits like induction heads through data-driven analysis.
[18] Big data and fuzzy logic for demand forecasting in supply chain management: a data-driven approach PDF
[19] Data-Driven Object Tracking: Integrating Modular Neural Networks into a Kalman Framework PDF
[20] FE: an efficient data-driven multiscale approach based on physics-constrained neural networks and automated data mining PDF
[21] AIFS--ECMWF's data-driven forecasting system PDF
[22] An ai-driven, scalable, and modular digital twin framework for traffic management PDF
[23] A unifying principle for the functional organization of visual cortex PDF
[24] Sensor-fault detection, isolation and accommodation for digital twins via modular data-driven architecture PDF
[25] Dynamic system modeling using a multisource transfer learning-based modular neural network for industrial application PDF
[26] Data-driven discovery of intrinsic dynamics PDF
[27] Data-driven emergence of convolutional structure in neural networks PDF
Per-token susceptibility attribution scores
The authors show that susceptibilities can be decomposed into per-token contributions with interpretable signs (positive for suppression, negative for expression). These token-level attribution scores can be efficiently estimated using local Stochastic Gradient Langevin Dynamics sampling around network checkpoints.