Relatron: Automating Relational Machine Learning over Relational Databases

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 5.5 Download Report PDF

AutoMLRelational databaseRelational deep learningGraph machine learningTabular machine learning

Predictive modeling over relational databases (RDBs) powers applications in various domains, yet remains challenging due to the need to capture both cross-table dependencies and complex feature interactions. Recent Relational Deep Learning (RDL) methods automate feature engineering via message passing, while classical approaches like Deep Feature Synthesis (DFS) rely on predefined non-parametric aggregators. Despite promising performance gains, the comparative advantages of RDL over DFS and the design principles for selecting effective architectures remain poorly understood. We present a comprehensive study that unifies RDL and DFS in a shared design space and conducts large-scale architecture-centric searches across diverse RDB tasks. Our analysis yields three key findings: (1) RDL does not consistently outperform DFS, with performance being highly task-dependent; (2) no single architecture dominates across tasks, underscoring the need for task-aware model selection; and (3) validation accuracy is an unreliable guide for architecture choice. This search yields a curated model performance bank that links model architecture configurations to their performance; leveraging this bank, we analyze the drivers of the RDL–DFS performance gap and introduce two task signals—RDB task homophily and an affinity embedding that captures path, feature, and temporal structure—whose correlation with the gap enables principled routing. Guided by these signals, we propose Relatron, a task embedding-based meta-selector that first chooses between RDL and DFS and then prunes the within-family search to deliver strong performance. Lightweight loss-landscape metrics further guard against brittle checkpoints by preferring flatter optima. In experiments, Relatron resolves the “more tuning, worse performance” effect and, in joint hyperparameter–architecture optimization, achieves up to 18.5% improvement over strong baselines with $10\times$ lower computational cost than Fisher information–based alternatives.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper contributes a unified design space for relational deep learning (RDL) and deep feature synthesis (DFS), a large-scale architecture search across diverse relational database tasks, and a model performance bank linking configurations to outcomes. It also introduces Relatron, a task embedding-based meta-selector for architecture and hyperparameter optimization. According to the taxonomy, this work resides in the 'Meta-Learning and Task-Aware Model Selection for Relational Data' leaf, where it is currently the only paper. This indicates a sparse research direction within the broader field of automating architecture selection for relational database prediction tasks.

The taxonomy reveals neighboring leaves focused on direct architecture search (two papers) and automated feature engineering (one paper) for relational data, alongside branches addressing structured non-relational data (graphs, knowledge graphs) and database design automation. The paper's meta-learning approach diverges from direct search methods by routing between model families based on task characteristics rather than performing architecture search in isolation. Its scope explicitly excludes manual architecture design and non-relational data prediction, positioning it at the intersection of automated model selection and relational schema-driven adaptation, a boundary less explored than direct search or feature engineering alone.

Among 22 candidates examined across three contributions, none were found to clearly refute the paper's claims. The comprehensive design space and model performance bank examined 10 candidates with zero refutable overlaps; the RDL–DFS performance gap analysis examined 2 candidates with no refutations; and Relatron's task embedding-based meta-selector examined 10 candidates, also with no refutations. This suggests that within the limited search scope, the paper's contributions appear relatively novel, though the small candidate pool and sparse taxonomy leaf indicate that the field itself may be under-explored rather than definitively validating the work's originality.

Given the limited literature search (22 candidates from top-K semantic search and citation expansion), the analysis captures a snapshot of closely related work but does not constitute an exhaustive review. The absence of sibling papers in the same taxonomy leaf and the low refutation counts across contributions suggest the paper occupies a relatively uncharted niche, though broader searches or domain-specific venues might reveal additional relevant prior work not captured here.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Automating architecture selection for relational database prediction tasks. The field encompasses several distinct branches that address how to build predictive models over structured data. The first branch, Relational Data Prediction with Automated Architecture Selection, focuses on meta-learning and task-aware model selection strategies that adapt neural architectures to the relational schema and prediction objectives at hand. A second branch, Automated Architecture Search for Structured Non-Relational Data, explores neural architecture search techniques for graph-like or hierarchical data representations, such as AutoSTG[1] for spatio-temporal graphs. The third branch, Database Design Automation and Query Interface Systems, tackles schema design and user-facing query tools, including works like Hybrid Query Interface[6] and Automate Relational Design[7]. Finally, Domain-Specific Structural Database Design for Prediction targets specialized database schemas tailored to particular application domains, exemplified by NAStructuralDB[4] and NAS Flight Data[11]. Within the landscape of relational data prediction, a central tension exists between fully automated end-to-end approaches and those requiring domain expertise or manual feature engineering. Works such as Automated Relational Data[2] and Auto Table Join[5] emphasize automating join path discovery and feature extraction, while Relational Action Forecasting[3] integrates temporal reasoning into relational models. Relatron[0] sits squarely in the meta-learning and task-aware model selection cluster, sharing with Automated Relational Data[2] an emphasis on learning from the relational structure itself, yet differing in its focus on architecture selection rather than purely feature-level automation. Compared to Relational Action Forecasting[3], which targets sequential prediction in relational settings, Relatron[0] addresses the broader challenge of choosing suitable model architectures across diverse relational prediction tasks, positioning it as a flexible framework for schema-driven model adaptation.

Claimed Contributions

Comprehensive design space and model performance bank for RDB tasks

10 retrieved papers

The authors construct a unified design space covering both Relational Deep Learning (RDL) and Deep Feature Synthesis (DFS) methods, then conduct large-scale architecture searches to build a performance bank that maps architecture configurations to their performance across diverse RDB tasks.

10 retrieved papers

Analysis of RDL–DFS performance gap and routing method

2 retrieved papers

The authors identify that RDL does not consistently outperform DFS and introduce RDB task homophily plus affinity embeddings (capturing size, path, feature, and temporal structure) to explain the performance gap, enabling principled routing between the two paradigms.

2 retrieved papers

Relatron: task embedding-based meta-selector for architecture and hyperparameter optimization

10 retrieved papers

Relatron is a meta-selector that uses task embeddings to choose between RDL and DFS, then narrows the search space within the selected family, and applies loss-landscape metrics to guard against unreliable validation-based checkpoint selection, achieving improved performance with lower computational cost.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Comprehensive design space and model performance bank for RDB tasks

[2] Automated Data Science for Relational Data PDF

Cannot Refute

[5] Atj-net: Auto-table-join network for automatic learning on relational databases PDF

Cannot Refute

[8] Autosmart: An efficient and automatic machine learning framework for temporal relational data PDF

Cannot Refute

[24] Leveraging Machine Learning for Optimal Object-Relational Database Mapping in Software Systems PDF

Cannot Refute

[25] Relational data synthesis using generative adversarial networks: A design space exploration PDF

Cannot Refute

[26] A Green Granular Neural Network with Efficient Software-FPGA Co-designed Learning PDF

Cannot Refute

[27] Artificial intelligenceâpowered plant phenomics: Progress, challenges, and opportunities PDF

Cannot Refute

[28] Performance engineering of data-intensive applications PDF

Cannot Refute

[29] A Green Granular Convolutional Neural Network with Software-FPGA Co-designed Learning PDF

Cannot Refute

[30] FROM CODE TO COMMERCE: A MACHINE LEARNING ENGINEER'S JOURNEY TO ENTREPRENEURSHIP PDF

Cannot Refute

Contribution

Analysis of RDL–DFS performance gap and routing method

[22] Relational data embeddings for feature enrichment with background information PDF

Cannot Refute

[23] Synthesize, Retrieve, and Propagate: A Unified Predictive Modeling Framework for Relational Databases PDF

Cannot Refute

Contribution

Relatron: task embedding-based meta-selector for architecture and hyperparameter optimization

[12] Meta-learning of neural architectures for few-shot learning PDF

Cannot Refute

[13] MLP-GNAS: Meta-learning-based predictor-assisted Genetic Neural Architecture Search system PDF

Cannot Refute

[14] Bilevel Programming for Hyperparameter Optimization and Meta-Learning PDF

Cannot Refute

[15] Across-task neural architecture search via meta learning PDF

Cannot Refute

[16] Meta-Learning Based CTR Algorithm Selection and Hyperparameter Optimization PDF

Cannot Refute

[17] Task2vec: Task embedding for meta-learning PDF

Cannot Refute

[18] Neural Architecture and Hyperparameter Selection through Meta-Learning on Time Series PDF

Cannot Refute

[19] MLNAS: Meta-learning based neural architecture search for automated generation of deep neural networks for plant disease detection tasks PDF

Cannot Refute

[20] Mlta: a meta-learning toolbox for automl PDF

Cannot Refute

[21] Hyperstar: Task-aware hyperparameter recommendation for training and compression PDF

Cannot Refute

Relatron: Automating Relational Machine Learning over Relational Databases

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

Contribution Analysis

Comprehensive design space and model performance bank for RDB tasks

[2] Automated Data Science for Relational Data PDF

[5] Atj-net: Auto-table-join network for automatic learning on relational databases PDF

[8] Autosmart: An efficient and automatic machine learning framework for temporal relational data PDF

[24] Leveraging Machine Learning for Optimal Object-Relational Database Mapping in Software Systems PDF

[25] Relational data synthesis using generative adversarial networks: A design space exploration PDF

[26] A Green Granular Neural Network with Efficient Software-FPGA Co-designed Learning PDF

[27] Artificial intelligenceâpowered plant phenomics: Progress, challenges, and opportunities PDF

[28] Performance engineering of data-intensive applications PDF

[29] A Green Granular Convolutional Neural Network with Software-FPGA Co-designed Learning PDF

[30] FROM CODE TO COMMERCE: A MACHINE LEARNING ENGINEER'S JOURNEY TO ENTREPRENEURSHIP PDF

Analysis of RDL–DFS performance gap and routing method

[22] Relational data embeddings for feature enrichment with background information PDF

[23] Synthesize, Retrieve, and Propagate: A Unified Predictive Modeling Framework for Relational Databases PDF

Relatron: task embedding-based meta-selector for architecture and hyperparameter optimization

[12] Meta-learning of neural architectures for few-shot learning PDF

[13] MLP-GNAS: Meta-learning-based predictor-assisted Genetic Neural Architecture Search system PDF

[14] Bilevel Programming for Hyperparameter Optimization and Meta-Learning PDF

[15] Across-task neural architecture search via meta learning PDF

[16] Meta-Learning Based CTR Algorithm Selection and Hyperparameter Optimization PDF

[17] Task2vec: Task embedding for meta-learning PDF

[18] Neural Architecture and Hyperparameter Selection through Meta-Learning on Time Series PDF

[19] MLNAS: Meta-learning based neural architecture search for automated generation of deep neural networks for plant disease detection tasks PDF

[20] Mlta: a meta-learning toolbox for automl PDF

[21] Hyperstar: Task-aware hyperparameter recommendation for training and compression PDF

Table of Contents

[27] Artificial intelligenceâpowered plant phenomics: Progress, challenges, and opportunities PDF