Exploratory Causal Inference in SAEnce
Overview
Taxonomy
Research Landscape Overview
Claimed Contributions
The authors establish a formal framework distinguishing between rationalist approaches (hypothesis-driven causal inference with predefined outcomes) and empiricist approaches (data-driven discovery of treatment effects). They characterize these paradigms within statistical causality, showing how they complement each other in scientific discovery.
The authors introduce a methodology that combines pretrained foundation models with sparse autoencoders to discover treatment effects in exploratory experiments. They identify and formalize the paradox of exploratory causal inference, showing how standard multiple testing fails when neural representations are entangled.
The authors develop Neural Effect Search, a recursive stratification procedure that addresses multiple-testing issues and effect entanglement in neural representations. The algorithm iteratively identifies significant causal effects while controlling for dependencies between neurons through progressive stratification.
Contribution Analysis
Detailed comparisons for each claimed contribution
Formal differentiation of rationalist and empiricist approaches to causal inference
The authors establish a formal framework distinguishing between rationalist approaches (hypothesis-driven causal inference with predefined outcomes) and empiricist approaches (data-driven discovery of treatment effects). They characterize these paradigms within statistical causality, showing how they complement each other in scientific discovery.
[55] Truth, knowledge, and entrepreneurship theory: arguments for a rationalist scientific epistemology PDF
[56] Radical empiricism and machine learning research PDF
[57] Logical empiricism PDF
[58] Mechanisms and mechanistic reasoning in medicine PDF
[59] Between rationalism and empiricism PDF
[60] Realism, empiricism and causal inquiry in International Relations: What is at stake? PDF
[61] Method and Analogy in Hellenistic Medicine PDF
[62] Causal learning in rats and humans: A minimal rational model PDF
[63] Aristotle's Induction and the Inference of First Principles PDF
[64] Causometry PDF
Novel empiricist methodology using foundation models and sparse autoencoders
The authors introduce a methodology that combines pretrained foundation models with sparse autoencoders to discover treatment effects in exploratory experiments. They identify and formalize the paradox of exploratory causal inference, showing how standard multiple testing fails when neural representations are entangled.
[65] Sparse autoencoders for scientifically rigorous interpretation of vision models PDF
[66] Improving Steering Vectors by Targeting Sparse Autoencoder Features PDF
[67] Sparse Auto-Encoder Interprets Linguistic Features in Large Language Models PDF
[68] Applying sparse autoencoders to unlearn knowledge in language models PDF
[69] A Deep Learning Framework for Causal Inference in Clinical Trial Design: The CURE AI Large Clinicogenomic Foundation Model PDF
[70] SAIF: A Sparse Autoencoder Framework for Interpreting and Steering Instruction Following of Language Models PDF
[71] Sparse autoencoders reveal temporal difference learning in large language models PDF
[72] Can role vectors affect llm behaviour PDF
[73] Saes can improve unlearning: Dynamic sparse autoencoder guardrails for precision unlearning in llms PDF
[74] Prototype-Based Multiple Instance Learning for Gigapixel Whole Slide Image Classification PDF
Neural Effect Search algorithm for iterative hypothesis testing
The authors develop Neural Effect Search, a recursive stratification procedure that addresses multiple-testing issues and effect entanglement in neural representations. The algorithm iteratively identifies significant causal effects while controlling for dependencies between neurons through progressive stratification.