A Relative Error-Based Evaluation Framework of Heterogeneous Treatment Effect Estimators

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

Causal InferenceConditional Average Treatment EffectRelative ErrorRobust Evaluation

While significant progress has been made in heterogeneous treatment effect (HTE) estimation, the evaluation of HTE estimators remains underdeveloped. In this article, we propose a robust evaluation framework based on relative error, which quantifies performance differences between two HTE estimators. We first derive the key theoretical conditions on the nuisance parameters that are necessary to achieve a robust estimator of relative error. Building on these conditions, we introduce novel loss functions and design a neural network architecture to estimate nuisance parameters, thereby obtaining a robust estimation of relative error. We provide large sample properties of the proposed relative error estimator. Furthermore, beyond evaluation, we propose a new learning algorithm for HTE that leverages both the previously HTE estimators and the nuisance parameters learned through our neural network architecture. Extensive experiments demonstrate that our evaluation framework supports reliable comparisons across HTE estimators, and the proposed learning algorithm for HTE exhibits desirable performance.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes a robust evaluation framework for heterogeneous treatment effect estimators based on relative error, which quantifies performance differences between two HTE estimators. According to the taxonomy, this work sits in the 'Relative Error and Comparative Assessment' leaf, which contains only three papers total. This is a notably sparse research direction within the broader evaluation landscape, suggesting the paper addresses a relatively underexplored aspect of HTE estimator comparison. The leaf focuses specifically on frameworks using relative error or comparative metrics to rank estimators when ground truth is unobservable.

The taxonomy reveals that the broader 'Evaluation Metrics and Frameworks' branch contains five distinct evaluation approaches, including model selection methods, calibration assessment, matching-based evaluation, and experimental metrics. The paper's focus on relative error distinguishes it from neighboring leaves like 'Calibration and Uncertainty Quantification' (four papers) and 'Model Selection and Surrogate Metrics' (three papers). While calibration methods assess prediction reliability and model selection approaches develop surrogate metrics, this work concentrates on direct comparative assessment between estimators. The taxonomy's scope notes clarify that relative error frameworks explicitly exclude calibration assessment, positioning this contribution as complementary to rather than overlapping with uncertainty quantification approaches.

Among the three contributions analyzed, the evaluation framework based on relative error examined nine candidates with two appearing to provide overlapping prior work. The novel loss functions and neural network architecture for nuisance parameter estimation examined ten candidates with one potentially refutable match. The new learning algorithm for HTE examined ten candidates with no clear refutations found. These statistics reflect a limited search scope of twenty-nine total candidates examined across all contributions. The evaluation framework contribution shows the most substantial prior work overlap, while the learning algorithm appears more novel within the examined candidate set.

Based on the limited search scope of approximately thirty semantically similar papers, the work appears to make contributions in a relatively sparse research area. The taxonomy structure suggests that relative error-based evaluation remains underdeveloped compared to other evaluation approaches. However, the analysis acknowledges that this assessment derives from top-K semantic search rather than exhaustive literature review, and the refutation statistics reflect only the examined candidates rather than the complete field.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: evaluation of heterogeneous treatment effect estimators. The field has organized itself around several complementary dimensions. Evaluation Metrics and Frameworks focuses on how to measure estimator performance, including relative error approaches like Relative Error Evaluation[0] and Trustworthy Relative Error[8], as well as broader assessment strategies such as Trustworthy Assessment HTE[30]. Estimation Methods and Algorithms encompasses the diverse algorithmic landscape, from forest-based approaches like Random Forests HTE[5] and Forest Estimators Work[1] to meta-learners such as Metalearners[13] and ensemble strategies like Ensemble ITE Method[3]. Methodological Considerations and Robustness addresses stability and reliability concerns, exemplified by Stable HTE Estimation[2] and Robust ITE Method[4]. Specialized Settings and Extensions covers adaptations to particular data structures or constraints, including panel data methods like Two-way Fixed Effects[29] and survival analysis via Causal Survival Forests[34]. Application Domains and Practical Implementation bridges theory and practice, with works like Student Success Ensemble[42] demonstrating real-world deployment. A central tension in the field concerns how to reliably compare estimators when ground truth heterogeneous effects are unobserved. Some lines of work emphasize model selection and comparative assessment, as seen in Model Selection Comparison[9] and Reliable Estimator Selection[17], while others focus on calibration and trustworthiness guarantees, such as Causal Isotonic Calibration[33] and Calibration Assessment Nonparametric[41]. Relative Error Evaluation[0] sits squarely within the comparative assessment cluster, sharing with Trustworthy Relative Error[8] and Trustworthy Assessment HTE[30] a focus on developing metrics that can rank estimators without requiring oracle knowledge of true treatment effects. Where Trustworthy Assessment HTE[30] may emphasize broader frameworks for trustworthiness, Relative Error Evaluation[0] appears to concentrate specifically on relative error measures as a principled basis for comparison, offering a complementary perspective on how practitioners might choose among competing methods in observational settings.

Claimed Contributions

Robust evaluation framework for HTE estimators based on relative error

Can Refute

9 retrieved papers

The authors introduce an evaluation framework that uses relative error to compare heterogeneous treatment effect estimators. This framework achieves robustness by relaxing the requirement for consistent outcome regression models while maintaining desirable statistical properties such as root-n consistency and asymptotic normality.

9 retrieved papers

Can Refute

Novel loss functions and neural network architecture for nuisance parameter estimation

Can Refute

10 retrieved papers

The authors design new loss functions (weighted least squares loss and balance regularizers) and propose a neural network architecture based on Dragonnet to estimate nuisance parameters (propensity scores and outcome regression models). This enables more reliable relative error estimation without requiring consistent outcome regression models.

10 retrieved papers

Can Refute

New learning algorithm for HTE leveraging nuisance parameters

10 retrieved papers

The authors develop a learning algorithm for heterogeneous treatment effects that aggregates information from candidate HTE estimators and nuisance parameters estimated by their proposed neural network. This algorithm demonstrates improved performance compared to existing methods.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[8] Trustworthy assessment of heterogeneous treatment effect estimator via analysis of relative error PDF

Z Gao (2025)

[30] Trustworthy assessment of heterogeneous treatment effect estimator PDF

Gao, Zijun, Zijun Gao (2024)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Robust evaluation framework for HTE estimators based on relative error

[8] Trustworthy assessment of heterogeneous treatment effect estimator via analysis of relative error PDF

Can Refute

[30] Trustworthy assessment of heterogeneous treatment effect estimator PDF

Can Refute

[47] Limits of estimating heterogeneous treatment effects: Guidelines for practical algorithm design PDF

Cannot Refute

[68] Heterogeneous treatment effects and optimal targeting policy evaluation PDF

Cannot Refute

[69] Relative contrast estimation and inference for treatment recommendation PDF

Cannot Refute

[70] Performance of mixed effects models and generalized estimating equations for continuous outcomes in partially clustered trials including both independent and paired â¦ PDF

Cannot Refute

[71] Automated efficient estimation using monte carlo efficient influence functions PDF

Cannot Refute

[72] The causal learning of retail delinquency PDF

Cannot Refute

[73] Searching optimal adjustment features for treatment effect estimation PDF

Cannot Refute

Contribution

Novel loss functions and neural network architecture for nuisance parameter estimation

[22] Adapting Neural Networks for the Estimation of Treatment Effects PDF

Can Refute

[51] Double debiased machine learning nonparametric inference with continuous treatments PDF

Cannot Refute

[52] Convolutional neural networks for valid and efficient causal inference PDF

Cannot Refute

[53] Factor informed double deep learning for average treatment effect estimation PDF

Cannot Refute

[54] An improved neural network model for treatment effect estimation PDF

Cannot Refute

[55] A tutorial on artificial neural networks in propensity score analysis PDF

Cannot Refute

[56] Inverse Probability of Treatment Weighting with Deep Sequence Models Enables Accurate treatment effect Estimation from Electronic Health Records PDF

Cannot Refute

[57] Semiparametric causal inference for right-censored outcomes with many weak invalid instruments PDF

Cannot Refute

[58] Propensity and Generalized Propensity Score Estimation Among Non-Linearity and High Dimensionality, Using Common and Machine Learning Techniques PDF

Cannot Refute

[59] Estimating heterogeneous causal effect on networks via orthogonal learning PDF

Cannot Refute

Contribution

New learning algorithm for HTE leveraging nuisance parameters

[20] Machine learning estimation of heterogeneous treatment effects with instruments PDF

Cannot Refute

[33] Causal isotonic calibration for heterogeneous treatment effects PDF

Cannot Refute

[60] Machine learning estimation of heterogeneous causal effects: Empirical Monte Carlo evidence PDF

Cannot Refute

[61] Generic machine learning inference on heterogeneous treatment effects in randomized experiments, with an application to immunization in India PDF

Cannot Refute

[62] Differentially Private Learners for Heterogeneous Treatment Effects PDF

Cannot Refute

[63] Conditional Front-door Adjustment for Heterogeneous Treatment Assignment Effect Estimation Under Non-adherence PDF

Cannot Refute

[64] Orthogonal statistical learning PDF

Cannot Refute

[65] Inference on heterogeneous treatment effects in highâdimensional dynamic panels under weak dependence PDF

Cannot Refute

[66] Estimating heterogeneous treatment effects for general responses PDF

Cannot Refute

[67] Conformal meta-learners for predictive inference of individual treatment effects PDF

Cannot Refute

A Relative Error-Based Evaluation Framework of Heterogeneous Treatment Effect Estimators

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[8] Trustworthy assessment of heterogeneous treatment effect estimator via analysis of relative error PDF

[30] Trustworthy assessment of heterogeneous treatment effect estimator PDF

Contribution Analysis

Robust evaluation framework for HTE estimators based on relative error

[8] Trustworthy assessment of heterogeneous treatment effect estimator via analysis of relative error PDF

[30] Trustworthy assessment of heterogeneous treatment effect estimator PDF

[47] Limits of estimating heterogeneous treatment effects: Guidelines for practical algorithm design PDF

[68] Heterogeneous treatment effects and optimal targeting policy evaluation PDF

[69] Relative contrast estimation and inference for treatment recommendation PDF

[70] Performance of mixed effects models and generalized estimating equations for continuous outcomes in partially clustered trials including both independent and paired â¦ PDF

[71] Automated efficient estimation using monte carlo efficient influence functions PDF

[72] The causal learning of retail delinquency PDF

[73] Searching optimal adjustment features for treatment effect estimation PDF

Novel loss functions and neural network architecture for nuisance parameter estimation

[22] Adapting Neural Networks for the Estimation of Treatment Effects PDF

[51] Double debiased machine learning nonparametric inference with continuous treatments PDF

[52] Convolutional neural networks for valid and efficient causal inference PDF

[53] Factor informed double deep learning for average treatment effect estimation PDF

[54] An improved neural network model for treatment effect estimation PDF

[55] A tutorial on artificial neural networks in propensity score analysis PDF

[56] Inverse Probability of Treatment Weighting with Deep Sequence Models Enables Accurate treatment effect Estimation from Electronic Health Records PDF

[57] Semiparametric causal inference for right-censored outcomes with many weak invalid instruments PDF

[58] Propensity and Generalized Propensity Score Estimation Among Non-Linearity and High Dimensionality, Using Common and Machine Learning Techniques PDF

[59] Estimating heterogeneous causal effect on networks via orthogonal learning PDF

New learning algorithm for HTE leveraging nuisance parameters

[20] Machine learning estimation of heterogeneous treatment effects with instruments PDF

[33] Causal isotonic calibration for heterogeneous treatment effects PDF

[60] Machine learning estimation of heterogeneous causal effects: Empirical Monte Carlo evidence PDF

[61] Generic machine learning inference on heterogeneous treatment effects in randomized experiments, with an application to immunization in India PDF

[62] Differentially Private Learners for Heterogeneous Treatment Effects PDF

[63] Conditional Front-door Adjustment for Heterogeneous Treatment Assignment Effect Estimation Under Non-adherence PDF

[64] Orthogonal statistical learning PDF

[65] Inference on heterogeneous treatment effects in highâdimensional dynamic panels under weak dependence PDF

[66] Estimating heterogeneous treatment effects for general responses PDF

[67] Conformal meta-learners for predictive inference of individual treatment effects PDF

Table of Contents

[70] Performance of mixed effects models and generalized estimating equations for continuous outcomes in partially clustered trials including both independent and paired â¦ PDF

[65] Inference on heterogeneous treatment effects in highâdimensional dynamic panels under weak dependence PDF