TabStruct: Measuring Structural Fidelity of Tabular Data
Overview
Overall Novelty Assessment
The paper proposes a multi-dimensional evaluation framework for synthetic tabular data that jointly assesses structural fidelity and conventional quality dimensions, introducing a 'global utility' metric that operates without ground-truth causal structures. It resides in the Multi-Dimensional Evaluation Frameworks leaf, which contains five papers including this one. This leaf sits within the broader Evaluation Frameworks and Methodologies branch, indicating a moderately populated research direction focused on holistic assessment approaches rather than isolated metrics or generation methods.
The taxonomy reveals neighboring work in Benchmark Suites and Comparative Studies (five papers) and Evaluation Tools and Platforms (five papers), both addressing systematic evaluation but with different emphases—standardized comparisons versus software implementation. The Structural and Relational Fidelity Assessment branch (two sub-leaves, eight papers total) focuses specifically on inter-column dependencies and heterogeneity, providing complementary depth to the multi-dimensional perspective. The paper bridges these areas by incorporating structural considerations into a comprehensive framework, distinguishing itself from purely statistical or utility-focused approaches in adjacent branches.
Among thirty candidates examined across three contributions, none yielded clear refutations. The global utility metric examined ten candidates with zero refutable overlaps, suggesting potential novelty in enabling structural assessment without ground-truth causal graphs. The joint evaluation framework and TabStruct benchmark each examined ten candidates with similar results. This limited search scope—thirty papers from semantic retrieval—cannot confirm absolute novelty but indicates that within the examined literature, no prior work explicitly combines these specific elements: causal-structure-agnostic structural metrics, multi-dimensional integration, and large-scale benchmarking across thirteen generators.
Based on top-thirty semantic matches and taxonomy positioning, the work appears to occupy a recognizable but not overcrowded niche. The Multi-Dimensional Evaluation Frameworks leaf contains four siblings, suggesting moderate prior activity in holistic assessment approaches. The absence of refutable candidates within this limited scope suggests the specific combination of contributions may be novel, though exhaustive search across the broader fifty-paper taxonomy and beyond would be necessary to confirm originality conclusively.
Taxonomy
Research Landscape Overview
Claimed Contributions
The authors propose global utility, a novel metric that allows evaluation of how well synthetic tabular data preserves causal structures without requiring access to ground-truth causal graphs, addressing a key limitation of existing structural fidelity metrics that only work on toy datasets with known causal structures.
The authors develop a comprehensive evaluation framework that integrates structural fidelity assessment with traditional evaluation dimensions such as density estimation, ML efficacy, and privacy preservation, providing a more holistic understanding of tabular generator performance than prior work.
The authors introduce TabStruct, a large-scale benchmark that evaluates 13 tabular generators across 29 datasets with multiple evaluation dimensions, addressing the limited scope of existing benchmarks and providing datasets, evaluation pipelines, and raw results as open resources.
Core Task Comparisons
Comparisons with papers in the same taxonomy category
[18] Critical Challenges and Guidelines in Evaluating Synthetic Tabular Data: A Systematic Review PDF
[31] Benchmarking Synthetic Tabular Data: A Multi-Dimensional Evaluation Framework PDF
[38] Synthetic tabular data evaluation in the health domain covering resemblance, utility, and privacy dimensions PDF
[39] FEST: A Unified Framework for Evaluating Synthetic Tabular Data PDF
Contribution Analysis
Detailed comparisons for each claimed contribution
Global utility metric for structural fidelity assessment
The authors propose global utility, a novel metric that allows evaluation of how well synthetic tabular data preserves causal structures without requiring access to ground-truth causal graphs, addressing a key limitation of existing structural fidelity metrics that only work on toy datasets with known causal structures.
[5] Evaluation of synthetic data generators on complex tabular data PDF
[8] LLM-TabLogic: Preserving Inter-Column Logical Relationships in Synthetic Tabular Data via Prompt-Guided Latent Diffusion PDF
[21] Structured Evaluation of Synthetic Tabular Data PDF
[22] A Comparative Study of Open-Source Libraries for Synthetic Tabular Data Generation: SDV vs. SynthCity PDF
[23] Preserving logical and functional dependencies in synthetic tabular data PDF
[30] Evaluating Fidelity and Machine Learning Utility of Synthetic Tabular Data Generated Using Generative Models PDF
[44] A Quantitative Comparison of Structural and Distributional Properties of Synthetic Tabular Data in Parkinson's Disease PDF
[51] Improving the generation and evaluation of synthetic data for downstream medical causal inference PDF
[52] Tabularargn: A flexible and efficient auto-regressive framework for generating high-fidelity synthetic data PDF
[53] Dependency-aware synthetic tabular data generation PDF
Evaluation framework jointly considering structural fidelity and conventional dimensions
The authors develop a comprehensive evaluation framework that integrates structural fidelity assessment with traditional evaluation dimensions such as density estimation, ML efficacy, and privacy preservation, providing a more holistic understanding of tabular generator performance than prior work.
[61] Generative Models in Computational Pathology: A Comprehensive Survey on Methods, Applications, and Challenges PDF
[62] Grid-Based Decompositions for Spatial Data under Local Differential Privacy PDF
[63] TLPP: Deep-Learning-Based Two-Layer Privacy Preserving Mechanism for Protecting Vehicle Trajectory Data PDF
[64] Integration Of Machine Learning and Advanced Computing For Optimizing Retail Customer Analytics PDF
[65] SynthVal: A Framework for Validating Synthetic Medical Images PDF
[66] Differentially Private Graph Data Publishing via Feature-Based Community Detection PDF
[67] Differentially private learning of structured discrete distributions PDF
[68] OTTER: Optimized Training with Trustworthy Enhanced Replication via Diffusion and Federated VMUNet for Privacy-Aware Medical Segmentation PDF
[69] Utilizing synthetic data for privacy-preserving AI modeling in radiomics: a case study * PDF
[70] Preserving privacy and fidelity via Ehrhart theory PDF
TabStruct benchmark suite
The authors introduce TabStruct, a large-scale benchmark that evaluates 13 tabular generators across 29 datasets with multiple evaluation dimensions, addressing the limited scope of existing benchmarks and providing datasets, evaluation pipelines, and raw results as open resources.