A Relative Error-Based Evaluation Framework of Heterogeneous Treatment Effect Estimators

ICLR 2026 Conference SubmissionAnonymous Authors
Causal InferenceConditional Average Treatment EffectRelative ErrorRobust Evaluation
Abstract:

While significant progress has been made in heterogeneous treatment effect (HTE) estimation, the evaluation of HTE estimators remains underdeveloped. In this article, we propose a robust evaluation framework based on relative error, which quantifies performance differences between two HTE estimators. We first derive the key theoretical conditions on the nuisance parameters that are necessary to achieve a robust estimator of relative error. Building on these conditions, we introduce novel loss functions and design a neural network architecture to estimate nuisance parameters, thereby obtaining a robust estimation of relative error. We provide large sample properties of the proposed relative error estimator. Furthermore, beyond evaluation, we propose a new learning algorithm for HTE that leverages both the previously HTE estimators and the nuisance parameters learned through our neural network architecture. Extensive experiments demonstrate that our evaluation framework supports reliable comparisons across HTE estimators, and the proposed learning algorithm for HTE exhibits desirable performance.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes a robust evaluation framework for heterogeneous treatment effect estimators based on relative error, which quantifies performance differences between two HTE estimators. According to the taxonomy, this work sits in the 'Relative Error and Comparative Assessment' leaf, which contains only three papers total. This is a notably sparse research direction within the broader evaluation landscape, suggesting the paper addresses a relatively underexplored aspect of HTE estimator comparison. The leaf focuses specifically on frameworks using relative error or comparative metrics to rank estimators when ground truth is unobservable.

The taxonomy reveals that the broader 'Evaluation Metrics and Frameworks' branch contains five distinct evaluation approaches, including model selection methods, calibration assessment, matching-based evaluation, and experimental metrics. The paper's focus on relative error distinguishes it from neighboring leaves like 'Calibration and Uncertainty Quantification' (four papers) and 'Model Selection and Surrogate Metrics' (three papers). While calibration methods assess prediction reliability and model selection approaches develop surrogate metrics, this work concentrates on direct comparative assessment between estimators. The taxonomy's scope notes clarify that relative error frameworks explicitly exclude calibration assessment, positioning this contribution as complementary to rather than overlapping with uncertainty quantification approaches.

Among the three contributions analyzed, the evaluation framework based on relative error examined nine candidates with two appearing to provide overlapping prior work. The novel loss functions and neural network architecture for nuisance parameter estimation examined ten candidates with one potentially refutable match. The new learning algorithm for HTE examined ten candidates with no clear refutations found. These statistics reflect a limited search scope of twenty-nine total candidates examined across all contributions. The evaluation framework contribution shows the most substantial prior work overlap, while the learning algorithm appears more novel within the examined candidate set.

Based on the limited search scope of approximately thirty semantically similar papers, the work appears to make contributions in a relatively sparse research area. The taxonomy structure suggests that relative error-based evaluation remains underdeveloped compared to other evaluation approaches. However, the analysis acknowledges that this assessment derives from top-K semantic search rather than exhaustive literature review, and the refutation statistics reflect only the examined candidates rather than the complete field.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
29
Contribution Candidate Papers Compared
3
Refutable Paper

Research Landscape Overview

Core task: evaluation of heterogeneous treatment effect estimators. The field has organized itself around several complementary dimensions. Evaluation Metrics and Frameworks focuses on how to measure estimator performance, including relative error approaches like Relative Error Evaluation[0] and Trustworthy Relative Error[8], as well as broader assessment strategies such as Trustworthy Assessment HTE[30]. Estimation Methods and Algorithms encompasses the diverse algorithmic landscape, from forest-based approaches like Random Forests HTE[5] and Forest Estimators Work[1] to meta-learners such as Metalearners[13] and ensemble strategies like Ensemble ITE Method[3]. Methodological Considerations and Robustness addresses stability and reliability concerns, exemplified by Stable HTE Estimation[2] and Robust ITE Method[4]. Specialized Settings and Extensions covers adaptations to particular data structures or constraints, including panel data methods like Two-way Fixed Effects[29] and survival analysis via Causal Survival Forests[34]. Application Domains and Practical Implementation bridges theory and practice, with works like Student Success Ensemble[42] demonstrating real-world deployment. A central tension in the field concerns how to reliably compare estimators when ground truth heterogeneous effects are unobserved. Some lines of work emphasize model selection and comparative assessment, as seen in Model Selection Comparison[9] and Reliable Estimator Selection[17], while others focus on calibration and trustworthiness guarantees, such as Causal Isotonic Calibration[33] and Calibration Assessment Nonparametric[41]. Relative Error Evaluation[0] sits squarely within the comparative assessment cluster, sharing with Trustworthy Relative Error[8] and Trustworthy Assessment HTE[30] a focus on developing metrics that can rank estimators without requiring oracle knowledge of true treatment effects. Where Trustworthy Assessment HTE[30] may emphasize broader frameworks for trustworthiness, Relative Error Evaluation[0] appears to concentrate specifically on relative error measures as a principled basis for comparison, offering a complementary perspective on how practitioners might choose among competing methods in observational settings.

Claimed Contributions

Robust evaluation framework for HTE estimators based on relative error

The authors introduce an evaluation framework that uses relative error to compare heterogeneous treatment effect estimators. This framework achieves robustness by relaxing the requirement for consistent outcome regression models while maintaining desirable statistical properties such as root-n consistency and asymptotic normality.

9 retrieved papers
Can Refute
Novel loss functions and neural network architecture for nuisance parameter estimation

The authors design new loss functions (weighted least squares loss and balance regularizers) and propose a neural network architecture based on Dragonnet to estimate nuisance parameters (propensity scores and outcome regression models). This enables more reliable relative error estimation without requiring consistent outcome regression models.

10 retrieved papers
Can Refute
New learning algorithm for HTE leveraging nuisance parameters

The authors develop a learning algorithm for heterogeneous treatment effects that aggregates information from candidate HTE estimators and nuisance parameters estimated by their proposed neural network. This algorithm demonstrates improved performance compared to existing methods.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Robust evaluation framework for HTE estimators based on relative error

The authors introduce an evaluation framework that uses relative error to compare heterogeneous treatment effect estimators. This framework achieves robustness by relaxing the requirement for consistent outcome regression models while maintaining desirable statistical properties such as root-n consistency and asymptotic normality.

Contribution

Novel loss functions and neural network architecture for nuisance parameter estimation

The authors design new loss functions (weighted least squares loss and balance regularizers) and propose a neural network architecture based on Dragonnet to estimate nuisance parameters (propensity scores and outcome regression models). This enables more reliable relative error estimation without requiring consistent outcome regression models.

Contribution

New learning algorithm for HTE leveraging nuisance parameters

The authors develop a learning algorithm for heterogeneous treatment effects that aggregates information from candidate HTE estimators and nuisance parameters estimated by their proposed neural network. This algorithm demonstrates improved performance compared to existing methods.

A Relative Error-Based Evaluation Framework of Heterogeneous Treatment Effect Estimators | Novelty Validation