Relatron: Automating Relational Machine Learning over Relational Databases

ICLR 2026 Conference SubmissionAnonymous Authors
AutoMLRelational databaseRelational deep learningGraph machine learningTabular machine learning
Abstract:

Predictive modeling over relational databases (RDBs) powers applications in various domains, yet remains challenging due to the need to capture both cross-table dependencies and complex feature interactions. Recent Relational Deep Learning (RDL) methods automate feature engineering via message passing, while classical approaches like Deep Feature Synthesis (DFS) rely on predefined non-parametric aggregators. Despite promising performance gains, the comparative advantages of RDL over DFS and the design principles for selecting effective architectures remain poorly understood. We present a comprehensive study that unifies RDL and DFS in a shared design space and conducts large-scale architecture-centric searches across diverse RDB tasks. Our analysis yields three key findings: (1) RDL does not consistently outperform DFS, with performance being highly task-dependent; (2) no single architecture dominates across tasks, underscoring the need for task-aware model selection; and (3) validation accuracy is an unreliable guide for architecture choice. This search yields a curated model performance bank that links model architecture configurations to their performance; leveraging this bank, we analyze the drivers of the RDL–DFS performance gap and introduce two task signals—RDB task homophily and an affinity embedding that captures path, feature, and temporal structure—whose correlation with the gap enables principled routing. Guided by these signals, we propose Relatron, a task embedding-based meta-selector that first chooses between RDL and DFS and then prunes the within-family search to deliver strong performance. Lightweight loss-landscape metrics further guard against brittle checkpoints by preferring flatter optima. In experiments, Relatron resolves the “more tuning, worse performance” effect and, in joint hyperparameter–architecture optimization, achieves up to 18.5% improvement over strong baselines with 10×10\times lower computational cost than Fisher information–based alternatives.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper contributes a unified design space for relational deep learning (RDL) and deep feature synthesis (DFS), a large-scale architecture search across diverse relational database tasks, and a model performance bank linking configurations to outcomes. It also introduces Relatron, a task embedding-based meta-selector for architecture and hyperparameter optimization. According to the taxonomy, this work resides in the 'Meta-Learning and Task-Aware Model Selection for Relational Data' leaf, where it is currently the only paper. This indicates a sparse research direction within the broader field of automating architecture selection for relational database prediction tasks.

The taxonomy reveals neighboring leaves focused on direct architecture search (two papers) and automated feature engineering (one paper) for relational data, alongside branches addressing structured non-relational data (graphs, knowledge graphs) and database design automation. The paper's meta-learning approach diverges from direct search methods by routing between model families based on task characteristics rather than performing architecture search in isolation. Its scope explicitly excludes manual architecture design and non-relational data prediction, positioning it at the intersection of automated model selection and relational schema-driven adaptation, a boundary less explored than direct search or feature engineering alone.

Among 22 candidates examined across three contributions, none were found to clearly refute the paper's claims. The comprehensive design space and model performance bank examined 10 candidates with zero refutable overlaps; the RDL–DFS performance gap analysis examined 2 candidates with no refutations; and Relatron's task embedding-based meta-selector examined 10 candidates, also with no refutations. This suggests that within the limited search scope, the paper's contributions appear relatively novel, though the small candidate pool and sparse taxonomy leaf indicate that the field itself may be under-explored rather than definitively validating the work's originality.

Given the limited literature search (22 candidates from top-K semantic search and citation expansion), the analysis captures a snapshot of closely related work but does not constitute an exhaustive review. The absence of sibling papers in the same taxonomy leaf and the low refutation counts across contributions suggest the paper occupies a relatively uncharted niche, though broader searches or domain-specific venues might reveal additional relevant prior work not captured here.

Taxonomy

Core-task Taxonomy Papers
11
3
Claimed Contributions
22
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: Automating architecture selection for relational database prediction tasks. The field encompasses several distinct branches that address how to build predictive models over structured data. The first branch, Relational Data Prediction with Automated Architecture Selection, focuses on meta-learning and task-aware model selection strategies that adapt neural architectures to the relational schema and prediction objectives at hand. A second branch, Automated Architecture Search for Structured Non-Relational Data, explores neural architecture search techniques for graph-like or hierarchical data representations, such as AutoSTG[1] for spatio-temporal graphs. The third branch, Database Design Automation and Query Interface Systems, tackles schema design and user-facing query tools, including works like Hybrid Query Interface[6] and Automate Relational Design[7]. Finally, Domain-Specific Structural Database Design for Prediction targets specialized database schemas tailored to particular application domains, exemplified by NAStructuralDB[4] and NAS Flight Data[11]. Within the landscape of relational data prediction, a central tension exists between fully automated end-to-end approaches and those requiring domain expertise or manual feature engineering. Works such as Automated Relational Data[2] and Auto Table Join[5] emphasize automating join path discovery and feature extraction, while Relational Action Forecasting[3] integrates temporal reasoning into relational models. Relatron[0] sits squarely in the meta-learning and task-aware model selection cluster, sharing with Automated Relational Data[2] an emphasis on learning from the relational structure itself, yet differing in its focus on architecture selection rather than purely feature-level automation. Compared to Relational Action Forecasting[3], which targets sequential prediction in relational settings, Relatron[0] addresses the broader challenge of choosing suitable model architectures across diverse relational prediction tasks, positioning it as a flexible framework for schema-driven model adaptation.

Claimed Contributions

Comprehensive design space and model performance bank for RDB tasks

The authors construct a unified design space covering both Relational Deep Learning (RDL) and Deep Feature Synthesis (DFS) methods, then conduct large-scale architecture searches to build a performance bank that maps architecture configurations to their performance across diverse RDB tasks.

10 retrieved papers
Analysis of RDL–DFS performance gap and routing method

The authors identify that RDL does not consistently outperform DFS and introduce RDB task homophily plus affinity embeddings (capturing size, path, feature, and temporal structure) to explain the performance gap, enabling principled routing between the two paradigms.

2 retrieved papers
Relatron: task embedding-based meta-selector for architecture and hyperparameter optimization

Relatron is a meta-selector that uses task embeddings to choose between RDL and DFS, then narrows the search space within the selected family, and applies loss-landscape metrics to guard against unreliable validation-based checkpoint selection, achieving improved performance with lower computational cost.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Comprehensive design space and model performance bank for RDB tasks

The authors construct a unified design space covering both Relational Deep Learning (RDL) and Deep Feature Synthesis (DFS) methods, then conduct large-scale architecture searches to build a performance bank that maps architecture configurations to their performance across diverse RDB tasks.

Contribution

Analysis of RDL–DFS performance gap and routing method

The authors identify that RDL does not consistently outperform DFS and introduce RDB task homophily plus affinity embeddings (capturing size, path, feature, and temporal structure) to explain the performance gap, enabling principled routing between the two paradigms.

Contribution

Relatron: task embedding-based meta-selector for architecture and hyperparameter optimization

Relatron is a meta-selector that uses task embeddings to choose between RDL and DFS, then narrows the search space within the selected family, and applies loss-landscape metrics to guard against unreliable validation-based checkpoint selection, achieving improved performance with lower computational cost.