Foundation Models for Causal Inference via Prior-Data Fitted Networks

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.7 Download Report PDF

Causal InferenceTreatment Effect EstimationFoundation Models

Prior-data fitted networks (PFNs) have recently been proposed as a promising way to train tabular foundation models. PFNs are transformers that are pre-trained on synthetic data generated from a prespecified prior distribution and that enable Bayesian inference through in-context learning. In this paper, we introduce CausalFM, a comprehensive framework for training PFN-based foundation models in various causal inference settings. First, we formalize the construction of Bayesian priors for causal inference based on structural causal models (SCMs) in a principled way and derive necessary criteria for the validity of such priors. Building on this, we propose a novel family of prior distributions using causality-inspired Bayesian neural networks that enable CausalFM to perform Bayesian causal inference in various settings, including back-door, front-door, and instrumental variable adjustment. Finally, we instantiate CausalFM and train our foundation models for estimating conditional average treatment effects (CATEs) for different settings. We show that CausalFM performs competitively for CATE estimation using various synthetic and semi-synthetic benchmarks. In sum, our framework can be used as a general recipe to train foundation models for various causal inference settings. In contrast to the current state-of-the-art in causal inference, CausalFM offers a novel paradigm with the potential to fundamentally change how practitioners perform causal inference in medicine, economics, and other disciplines.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces CausalFM, a framework for training PFN-based foundation models to perform Bayesian causal inference across multiple identification strategies (back-door, front-door, instrumental variables). It resides in the 'Amortized Causal Effect Estimation' leaf, which contains only three papers total, including this one. This is a relatively sparse research direction within the broader taxonomy of 41 papers across 20 leaf nodes, suggesting that PFN-based amortized causal inference remains an emerging area with limited prior work directly addressing the same scope.

The taxonomy reveals that CausalFM sits within a larger branch of 'Causal Inference via PFN-Based Foundation Models' (seven papers across four leaves), which itself is one of four major branches. Neighboring leaves address causal discovery using PFN embeddings and causal fairness applications, while sibling branches explore non-PFN foundation models (LLMs, diffusion models) for causal reasoning and domain-specific integrations. The scope note for the parent branch explicitly focuses on 'methods using PFN architectures to estimate causal effects,' distinguishing this work from general tabular prediction methods and non-PFN causal approaches found elsewhere in the taxonomy.

Among 30 candidates examined, the first contribution (CausalFM framework) shows one refutable candidate out of 10 examined, indicating some overlap with existing PFN-based causal inference work within this limited search scope. The second contribution (formalization of Bayesian priors for causal inference based on SCMs) and third contribution (causality-inspired Bayesian neural network priors) each examined 10 candidates with zero refutations, suggesting these specific technical elements may be more novel. However, the search scale is modest—30 candidates total—so these statistics reflect top-K semantic matches rather than exhaustive coverage of the causal inference literature.

Given the sparse taxonomy leaf (three papers) and limited search scope, CausalFM appears to occupy a relatively underexplored niche at the intersection of PFN architectures and multi-strategy causal inference. The framework-level contribution shows some prior work overlap, while the technical innovations around SCM-based priors and causality-inspired BNN distributions appear less directly anticipated by the examined candidates. A more exhaustive search beyond top-30 semantic matches would be needed to assess whether these elements have precedents in the broader causal inference or Bayesian deep learning communities.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: causal inference via foundation models using prior-data fitted networks. The field structure reflects a convergence of modern deep learning architectures with classical causal inference objectives. At the highest level, the taxonomy distinguishes between works that develop PFN-specific architectures and training procedures (such as TabPFN[8] and its extensions), those that apply PFN-based models directly to causal tasks like effect estimation and discovery, and a parallel stream exploring non-PFN foundation models (including large language models like Causal Reasoning LLMs[3] and Causality LLMs[9]) for causal reasoning. Additional branches capture domain-specific integrations—ranging from molecular causality to climate risk—and broader conceptual perspectives on how causality and foundation models interact, as seen in works like Causal Foundation Duality[19] and Causal Attention Duality[21]. Within the PFN-based causal inference branch, a particularly active line focuses on amortized causal effect estimation, where models learn to predict treatment effects in-context without retraining. Foundation Models Causal Inference[0] sits squarely in this cluster, emphasizing rapid, amortized inference for causal queries. Nearby works like CausalPFN[12] and Do-PFN[14] share this amortization theme but may differ in how they handle confounding or interventional distributions. In contrast, other PFN applications target causal discovery (Amortized Causal Discovery[22]) or fairness-aware prediction (FairPFN[23]), illustrating the breadth of causal tasks that PFNs can address. Meanwhile, non-PFN approaches such as Tabular Foundation Model[1] and Bayesian Tabular Foundation[5] offer alternative pathways for tabular causal inference, trading off the in-context learning speed of PFNs against potentially richer uncertainty quantification or broader applicability. The central tension across these directions revolves around balancing expressiveness, computational efficiency, and the ability to generalize across diverse causal structures with minimal task-specific tuning.

Claimed Contributions

CausalFM framework for training PFN-based foundation models for causal inference

Can Refute

10 retrieved papers

The authors propose CausalFM, a general framework that enables training prior-data fitted network (PFN) foundation models to perform causal inference across multiple settings including back-door, front-door, and instrumental variable adjustment. This framework allows practitioners to perform causal inference through in-context learning without retraining for each new dataset.

10 retrieved papers

Can Refute

Formalization of Bayesian priors for causal inference based on structural causal models

10 retrieved papers

The authors provide a principled formalization for constructing Bayesian priors based on structural causal models (SCMs) for causal inference. They derive necessary validity criteria for such priors, including the concept of well-specified priors that ensure consistent estimation of causal queries.

10 retrieved papers

Novel family of prior distributions using causality-inspired Bayesian neural networks

10 retrieved papers

The authors introduce a new family of prior distributions that leverage Bayesian neural networks designed to respect the causal structure of the inference problem. These priors enable CausalFM to perform Bayesian causal inference across different settings while providing identifiability guarantees.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[12] CausalPFN: Amortized Causal Effect Estimation via In-Context Learning PDF

Balazadeh, Vahid, Kamkari, Hamidreza, Vahid Balazadeh, Thomas, Valentin, Hamidreza Kamkari, Li Benson, Valentin Thomas, Ma, Junwei, Benson Li, Cresswell, Jesse C., Junwei Ma, Krishnan, Rahul G., Jesse C. Cresswell, Rahul G. Krishnan (2025) • arXiv.org

[14] Do-PFN: In-context learning for causal effect estimation PDF

Reuter, Arik, Jake Robertson, Guo Si-yuan, Arik Reuter, Hollmann, Noah, Siyuan Guo, Hutter, Frank, Noah Hollmann, SchÃ¶lkopf, Bernhard, Frank Hutter, Bernhard SchÃ¶lkopf (2025)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

CausalFM framework for training PFN-based foundation models for causal inference

[14] Do-PFN: In-context learning for causal effect estimation PDF

Can Refute

[12] CausalPFN: Amortized Causal Effect Estimation via In-Context Learning PDF

Cannot Refute

[19] Towards causal foundation model: on duality between optimal balancing and attention PDF

Cannot Refute

[62] Cladder: Assessing causal reasoning in language models PDF

Cannot Refute

[63] Valuing training data via causal inference for in-context learning PDF

Cannot Refute

[64] CausalLM is not optimal for in-context learning PDF

Cannot Refute

[65] Large language model for causal decision making PDF

Cannot Refute

[66] Extracting self-consistent causal insights from users feedback with llms and in-context learning PDF

Cannot Refute

[67] Llm4causal: Democratized causal tools for everyone via large language model PDF

Cannot Refute

[68] Evaluating causal reasoning capabilities of large language models: A systematic analysis across three scenarios PDF

Cannot Refute

Contribution

Formalization of Bayesian priors for causal inference based on structural causal models

[42] The impact of prior knowledge on causal structure learning PDF

Cannot Refute

[43] Choice Function-Based Hyper-Heuristics for Causal Discovery under Linear Structural Equation Models PDF

Cannot Refute

[44] Consistent DAG selection for Bayesian causal discovery under general error distributions PDF

Cannot Refute

[45] Hierarchical causal models PDF

Cannot Refute

[46] Inferring the neutron star equation of state with nuclear-physics informed semiparametric models PDF

Cannot Refute

[47] Inferring causal impact using Bayesian structural time-series models PDF

Cannot Refute

[48] DiBS: Differentiable Bayesian Structure Learning PDF

Cannot Refute

[49] Bayesian Vector AutoRegression with Factorised Granger-Causal Graphs PDF

Cannot Refute

[50] Causal Bayesian Optimization via Exogenous Distribution Learning PDF

Cannot Refute

[51] Generating and Transferring Priors for Causal Bayesian Network Parameter Estimation in Robotic Tasks PDF

Cannot Refute

Contribution

Novel family of prior distributions using causality-inspired Bayesian neural networks

[52] Identifiability guarantees for causal disentanglement from soft interventions PDF

Cannot Refute

[53] Heterophilic Graph Neural Networks Optimization with Causal Message-passing PDF

Cannot Refute

[54] General control functions for causal effect estimation from ivs PDF

Cannot Refute

[55] â¦ root cause analysis for energy consumption of medium and heavy plate: A novel method based on Bayesian neural network with Adam variational inference PDF

Cannot Refute

[56] Active Bayesian Causal Inference PDF

Cannot Refute

[57] Causal-bald: Deep bayesian active learning of outcomes to infer treatment-effects from observational data PDF

Cannot Refute

[58] Causal inference with conditional front-door adjustment and identifiable variational autoencoder PDF

Cannot Refute

[59] Inducing Causal Structure for Interpretable Neural Networks PDF

Cannot Refute

[60] Bayesian Causal Inference with Gaussian Process Networks PDF

Cannot Refute

[61] A Bayesian approach to learning causal networks PDF

Cannot Refute

Foundation Models for Causal Inference via Prior-Data Fitted Networks

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[12] CausalPFN: Amortized Causal Effect Estimation via In-Context Learning PDF

[14] Do-PFN: In-context learning for causal effect estimation PDF

Contribution Analysis

CausalFM framework for training PFN-based foundation models for causal inference

[14] Do-PFN: In-context learning for causal effect estimation PDF

[12] CausalPFN: Amortized Causal Effect Estimation via In-Context Learning PDF

[19] Towards causal foundation model: on duality between optimal balancing and attention PDF

[62] Cladder: Assessing causal reasoning in language models PDF

[63] Valuing training data via causal inference for in-context learning PDF

[64] CausalLM is not optimal for in-context learning PDF

[65] Large language model for causal decision making PDF

[66] Extracting self-consistent causal insights from users feedback with llms and in-context learning PDF

[67] Llm4causal: Democratized causal tools for everyone via large language model PDF

[68] Evaluating causal reasoning capabilities of large language models: A systematic analysis across three scenarios PDF

Formalization of Bayesian priors for causal inference based on structural causal models

[42] The impact of prior knowledge on causal structure learning PDF

[43] Choice Function-Based Hyper-Heuristics for Causal Discovery under Linear Structural Equation Models PDF

[44] Consistent DAG selection for Bayesian causal discovery under general error distributions PDF

[45] Hierarchical causal models PDF

[46] Inferring the neutron star equation of state with nuclear-physics informed semiparametric models PDF

[47] Inferring causal impact using Bayesian structural time-series models PDF

[48] DiBS: Differentiable Bayesian Structure Learning PDF

[49] Bayesian Vector AutoRegression with Factorised Granger-Causal Graphs PDF

[50] Causal Bayesian Optimization via Exogenous Distribution Learning PDF

[51] Generating and Transferring Priors for Causal Bayesian Network Parameter Estimation in Robotic Tasks PDF

Novel family of prior distributions using causality-inspired Bayesian neural networks

[52] Identifiability guarantees for causal disentanglement from soft interventions PDF

[53] Heterophilic Graph Neural Networks Optimization with Causal Message-passing PDF

[54] General control functions for causal effect estimation from ivs PDF

[55] â¦ root cause analysis for energy consumption of medium and heavy plate: A novel method based on Bayesian neural network with Adam variational inference PDF

[56] Active Bayesian Causal Inference PDF

[57] Causal-bald: Deep bayesian active learning of outcomes to infer treatment-effects from observational data PDF

[58] Causal inference with conditional front-door adjustment and identifiable variational autoencoder PDF

[59] Inducing Causal Structure for Interpretable Neural Networks PDF

[60] Bayesian Causal Inference with Gaussian Process Networks PDF

[61] A Bayesian approach to learning causal networks PDF

Table of Contents

[55] â¦ root cause analysis for energy consumption of medium and heavy plate: A novel method based on Bayesian neural network with Adam variational inference PDF