Relational Transformer: Toward Zero-Shot Foundation Models for Relational Data

ICLR 2026 Conference SubmissionAnonymous Authors
foundation modelsrelational deep learningrelational datatransformer
Abstract:

Pretrained transformers readily adapt to new sequence modeling tasks via zero-shot prompting, but relational domains still lack architectures that transfer across datasets and tasks. The core challenge is the diversity of relational data, with varying heterogeneous schemas, graph structures, and functional dependencies. We propose the Relational Transformer (RT), a cell-level architecture pretrained on diverse relational databases and directly applicable to unseen datasets and tasks, without any need for task- or dataset-specific fine-tuning or retrieval of in-context examples. RT (i) tokenizes cells with table/column metadata, (ii) is pretrained via masked token prediction, and (iii) utilizes a novel Relational Attention mechanism over columns, rows, and primary–foreign key links. Pretrained on RelBench datasets spanning tasks such as churn and sales forecasting, RT attains strong zero-shot performance; on binary classification it averages 94% of fully supervised AUROC in a single forward pass, and fine-tuning yields state-of-the-art results with high sample efficiency. Our experiments show that RT’s zero-shot transfer harnesses task-table context, column and feature attention, and schema semantics. Overall, RT provides a practical path toward foundation models for relational data.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces a Relational Transformer architecture pretrained on diverse relational databases for zero-shot transfer to unseen datasets and tasks. It resides in the 'Relational Transfer Learning Foundations' leaf, which contains only three papers total, indicating a relatively sparse research direction within the broader taxonomy. This leaf focuses on core methods for transferring relational knowledge across domains with minimal target data, distinguishing it from query translation or database optimization branches that dominate other parts of the field.

The taxonomy tree positions this work within 'Cross-Domain and Cross-Task Transfer Mechanisms', adjacent to leaves addressing event reasoning and schema-guided dialog systems. Neighboring branches include 'Query Translation and Natural Language Interfaces' (heavily populated with LLM-based text-to-SQL methods) and 'Schema and Knowledge Structure Learning' (focused on schema inference and knowledge graph completion). The paper's cell-level tokenization and relational attention diverge from these directions by targeting architectural pretraining rather than prompt engineering or schema extraction, bridging neural representation learning with relational reasoning.

Among 26 candidates examined across three contributions, none were flagged as clearly refuting the proposed methods. The Relational Transformer architecture and Relational Attention mechanism each had 10 candidates reviewed with zero refutable overlaps, while cell-level tokenization examined 6 candidates with similar results. This suggests that within the limited search scope—primarily top-K semantic matches and citation expansion—the specific combination of cell-level pretraining, relational attention over primary-foreign key links, and zero-shot transfer to heterogeneous schemas appears relatively unexplored in prior work.

Based on the examined literature, the work occupies a sparsely populated niche combining architectural pretraining with relational structure encoding. The analysis covers a focused set of semantically related candidates rather than an exhaustive field survey, so conclusions reflect novelty within this bounded scope. The taxonomy context indicates that while transfer learning for relational data is an active area, foundational architectures enabling true zero-shot generalization across diverse schemas remain underrepresented compared to task-specific or prompt-based approaches.

Taxonomy

Core-task Taxonomy Papers
36
3
Claimed Contributions
26
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: zero-shot transfer learning for relational databases. The field addresses how models can generalize to new database schemas, queries, or domains without task-specific retraining. The taxonomy reveals several complementary directions. Query Translation and Natural Language Interfaces focus on converting user questions into executable SQL, often leveraging large language models and schema-aware prompting strategies (e.g., Think2SQL[6], Prompt LLMs Text2SQL[9]). Schema and Knowledge Structure Learning emphasizes extracting and representing relational semantics, including schema linking and knowledge graph construction (e.g., Zero-shot Knowledge Graph[3], CONSchema[26]). Database Optimization and Performance Prediction targets cost estimation and query planning under novel workloads (e.g., CardBench[2], QORA[21]). Cross-Domain and Cross-Task Transfer Mechanisms explore foundational techniques for adapting learned representations across different relational settings, while Specialized Applications and Domain-Specific Transfer apply these ideas to particular verticals such as geospatial data or event schemas. Recent work highlights a tension between end-to-end neural approaches and modular pipelines that decompose schema understanding from query generation. Many studies pursue few-shot or prompt-based methods to handle unseen schemas (AMAZe[1], Small Large NL2SQL[12]), yet foundational transfer mechanisms remain critical for truly zero-shot scenarios. Relational Transformer[0] sits within Cross-Domain and Cross-Task Transfer Mechanisms, specifically under Relational Transfer Learning Foundations, emphasizing architectural designs that encode relational structure for broad generalization. It contrasts with earlier symbolic methods like Relational Domains Transfer[24] and representation-focused work such as Abstract Relational Features[34], offering a neural pathway that bridges schema-agnostic pretraining with downstream database tasks. This positioning reflects ongoing efforts to unify language understanding with relational reasoning, a central challenge as databases grow more heterogeneous and query interfaces more conversational.

Claimed Contributions

Relational Transformer (RT) architecture for relational databases

The authors introduce a novel transformer architecture designed specifically for relational databases that operates at the cell level, enabling pretraining on diverse databases and zero-shot transfer to new datasets and tasks without fine-tuning or retrieval of in-context examples.

10 retrieved papers
Relational Attention mechanism

The authors develop a specialized attention mechanism comprising column attention, feature attention, and neighbor attention layers that explicitly model dependencies across cells, rows, and tables by leveraging the relational structure of databases.

10 retrieved papers
Cell-level tokenization with task table integration

The authors propose representing each database cell as a token with embeddings from its value, column name, and table name, combined with task table integration that augments the database with task-specific context, enabling all downstream tasks to be cast as masked token prediction.

6 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Relational Transformer (RT) architecture for relational databases

The authors introduce a novel transformer architecture designed specifically for relational databases that operates at the cell level, enabling pretraining on diverse databases and zero-shot transfer to new datasets and tasks without fine-tuning or retrieval of in-context examples.

Contribution

Relational Attention mechanism

The authors develop a specialized attention mechanism comprising column attention, feature attention, and neighbor attention layers that explicitly model dependencies across cells, rows, and tables by leveraging the relational structure of databases.

Contribution

Cell-level tokenization with task table integration

The authors propose representing each database cell as a token with embeddings from its value, column name, and table name, combined with task table integration that augments the database with task-specific context, enabling all downstream tasks to be cast as masked token prediction.