TabStruct: Measuring Structural Fidelity of Tabular Data

ICLR 2026 Conference SubmissionAnonymous Authors
Tabular dataTabular data structureSynthetic data generation
Abstract:

Evaluating tabular generators remains a challenging problem, as the unique causal structural prior of heterogeneous tabular data does not lend itself to intuitive human inspection. Recent work has introduced structural fidelity as a tabular-specific evaluation dimension to assess whether synthetic data complies with the causal structures of real data. However, existing benchmarks often neglect the interplay between structural fidelity and conventional evaluation dimensions, thus failing to provide a holistic understanding of model performance. Moreover, they are typically limited to toy datasets, as quantifying existing structural fidelity metrics requires access to ground-truth causal structures, which are rarely available for real-world datasets. In this paper, we propose a novel evaluation framework that jointly considers structural fidelity and conventional evaluation dimensions. We introduce a new evaluation metric, global utility, which enables the assessment of structural fidelity even in the absence of ground-truth causal structures. In addition, we present TabStruct, a comprehensive evaluation benchmark offering large-scale quantitative analysis on 13 tabular generators from nine distinct categories, across 29 datasets. Our results demonstrate that global utility provides a task-independent, domain-agnostic lens for tabular generator performance. We release the TabStruct benchmark suite, including all datasets, evaluation pipelines, and raw results.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes a multi-dimensional evaluation framework for synthetic tabular data that jointly assesses structural fidelity and conventional quality dimensions, introducing a 'global utility' metric that operates without ground-truth causal structures. It resides in the Multi-Dimensional Evaluation Frameworks leaf, which contains five papers including this one. This leaf sits within the broader Evaluation Frameworks and Methodologies branch, indicating a moderately populated research direction focused on holistic assessment approaches rather than isolated metrics or generation methods.

The taxonomy reveals neighboring work in Benchmark Suites and Comparative Studies (five papers) and Evaluation Tools and Platforms (five papers), both addressing systematic evaluation but with different emphases—standardized comparisons versus software implementation. The Structural and Relational Fidelity Assessment branch (two sub-leaves, eight papers total) focuses specifically on inter-column dependencies and heterogeneity, providing complementary depth to the multi-dimensional perspective. The paper bridges these areas by incorporating structural considerations into a comprehensive framework, distinguishing itself from purely statistical or utility-focused approaches in adjacent branches.

Among thirty candidates examined across three contributions, none yielded clear refutations. The global utility metric examined ten candidates with zero refutable overlaps, suggesting potential novelty in enabling structural assessment without ground-truth causal graphs. The joint evaluation framework and TabStruct benchmark each examined ten candidates with similar results. This limited search scope—thirty papers from semantic retrieval—cannot confirm absolute novelty but indicates that within the examined literature, no prior work explicitly combines these specific elements: causal-structure-agnostic structural metrics, multi-dimensional integration, and large-scale benchmarking across thirteen generators.

Based on top-thirty semantic matches and taxonomy positioning, the work appears to occupy a recognizable but not overcrowded niche. The Multi-Dimensional Evaluation Frameworks leaf contains four siblings, suggesting moderate prior activity in holistic assessment approaches. The absence of refutable candidates within this limited scope suggests the specific combination of contributions may be novel, though exhaustive search across the broader fifty-paper taxonomy and beyond would be necessary to confirm originality conclusively.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
30
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: Evaluating structural fidelity of synthetic tabular data. The field has organized itself around several complementary perspectives. Evaluation Frameworks and Methodologies provide overarching systems for assessing synthetic data quality, often combining multiple dimensions such as statistical resemblance, utility, and privacy. Structural and Relational Fidelity Assessment focuses specifically on whether generated tables preserve inter-column dependencies, logical constraints, and relational integrity—issues that simple marginal or distributional checks may miss. Evaluation Metrics and Measurement Approaches develop concrete scoring functions and distance measures, while Privacy-Utility Tradeoff Analysis examines the tension between data protection and downstream usefulness. Generation Methods and Comparative Analysis benchmarks different synthesizers (GANs, diffusion models, large language models), Domain-Specific Applications tackle challenges in healthcare or finance, Data-Centric and Preprocessing Approaches address data quality before generation, and Survey and Review Studies synthesize the landscape. Together, these branches reflect a maturing discipline that balances theoretical rigor with practical deployment concerns. Recent work highlights the difficulty of capturing complex structural properties beyond univariate statistics. Multi-Dimensional Evaluation[31] and Critical Evaluation Challenges[18] emphasize that no single metric suffices; evaluators must consider fidelity, diversity, and privacy simultaneously. TabStruct[0] sits squarely within the Multi-Dimensional Evaluation Frameworks branch, proposing a structured approach to assess how well synthetic data preserves intricate dependencies and logical relationships. It shares common ground with Synthetic Tabular Quality[3], which also advocates for holistic quality measures, and with Complex Tabular Evaluation[5], which stresses the need to go beyond simple distributional tests. Meanwhile, works like Inter-Column Logical Relationships[1] and Benchmarking Relational Data[2] drill into specific structural aspects—foreign keys, functional dependencies—that TabStruct[0] aims to incorporate into a unified framework. The central open question remains how to balance computational cost, interpretability, and coverage when evaluating increasingly sophisticated generative models across diverse application domains.

Claimed Contributions

Global utility metric for structural fidelity assessment

The authors propose global utility, a novel metric that allows evaluation of how well synthetic tabular data preserves causal structures without requiring access to ground-truth causal graphs, addressing a key limitation of existing structural fidelity metrics that only work on toy datasets with known causal structures.

10 retrieved papers
Evaluation framework jointly considering structural fidelity and conventional dimensions

The authors develop a comprehensive evaluation framework that integrates structural fidelity assessment with traditional evaluation dimensions such as density estimation, ML efficacy, and privacy preservation, providing a more holistic understanding of tabular generator performance than prior work.

10 retrieved papers
TabStruct benchmark suite

The authors introduce TabStruct, a large-scale benchmark that evaluates 13 tabular generators across 29 datasets with multiple evaluation dimensions, addressing the limited scope of existing benchmarks and providing datasets, evaluation pipelines, and raw results as open resources.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Global utility metric for structural fidelity assessment

The authors propose global utility, a novel metric that allows evaluation of how well synthetic tabular data preserves causal structures without requiring access to ground-truth causal graphs, addressing a key limitation of existing structural fidelity metrics that only work on toy datasets with known causal structures.

Contribution

Evaluation framework jointly considering structural fidelity and conventional dimensions

The authors develop a comprehensive evaluation framework that integrates structural fidelity assessment with traditional evaluation dimensions such as density estimation, ML efficacy, and privacy preservation, providing a more holistic understanding of tabular generator performance than prior work.

Contribution

TabStruct benchmark suite

The authors introduce TabStruct, a large-scale benchmark that evaluates 13 tabular generators across 29 datasets with multiple evaluation dimensions, addressing the limited scope of existing benchmarks and providing datasets, evaluation pipelines, and raw results as open resources.