Relational Transformer: Toward Zero-Shot Foundation Models for Relational Data

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

foundation modelsrelational deep learningrelational datatransformer

Pretrained transformers readily adapt to new sequence modeling tasks via zero-shot prompting, but relational domains still lack architectures that transfer across datasets and tasks. The core challenge is the diversity of relational data, with varying heterogeneous schemas, graph structures, and functional dependencies. We propose the Relational Transformer (RT), a cell-level architecture pretrained on diverse relational databases and directly applicable to unseen datasets and tasks, without any need for task- or dataset-specific fine-tuning or retrieval of in-context examples. RT (i) tokenizes cells with table/column metadata, (ii) is pretrained via masked token prediction, and (iii) utilizes a novel Relational Attention mechanism over columns, rows, and primary–foreign key links. Pretrained on RelBench datasets spanning tasks such as churn and sales forecasting, RT attains strong zero-shot performance; on binary classification it averages 94% of fully supervised AUROC in a single forward pass, and fine-tuning yields state-of-the-art results with high sample efficiency. Our experiments show that RT’s zero-shot transfer harnesses task-table context, column and feature attention, and schema semantics. Overall, RT provides a practical path toward foundation models for relational data.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces a Relational Transformer architecture pretrained on diverse relational databases for zero-shot transfer to unseen datasets and tasks. It resides in the 'Relational Transfer Learning Foundations' leaf, which contains only three papers total, indicating a relatively sparse research direction within the broader taxonomy. This leaf focuses on core methods for transferring relational knowledge across domains with minimal target data, distinguishing it from query translation or database optimization branches that dominate other parts of the field.

The taxonomy tree positions this work within 'Cross-Domain and Cross-Task Transfer Mechanisms', adjacent to leaves addressing event reasoning and schema-guided dialog systems. Neighboring branches include 'Query Translation and Natural Language Interfaces' (heavily populated with LLM-based text-to-SQL methods) and 'Schema and Knowledge Structure Learning' (focused on schema inference and knowledge graph completion). The paper's cell-level tokenization and relational attention diverge from these directions by targeting architectural pretraining rather than prompt engineering or schema extraction, bridging neural representation learning with relational reasoning.

Among 26 candidates examined across three contributions, none were flagged as clearly refuting the proposed methods. The Relational Transformer architecture and Relational Attention mechanism each had 10 candidates reviewed with zero refutable overlaps, while cell-level tokenization examined 6 candidates with similar results. This suggests that within the limited search scope—primarily top-K semantic matches and citation expansion—the specific combination of cell-level pretraining, relational attention over primary-foreign key links, and zero-shot transfer to heterogeneous schemas appears relatively unexplored in prior work.

Based on the examined literature, the work occupies a sparsely populated niche combining architectural pretraining with relational structure encoding. The analysis covers a focused set of semantically related candidates rather than an exhaustive field survey, so conclusions reflect novelty within this bounded scope. The taxonomy context indicates that while transfer learning for relational data is an active area, foundational architectures enabling true zero-shot generalization across diverse schemas remain underrepresented compared to task-specific or prompt-based approaches.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: zero-shot transfer learning for relational databases. The field addresses how models can generalize to new database schemas, queries, or domains without task-specific retraining. The taxonomy reveals several complementary directions. Query Translation and Natural Language Interfaces focus on converting user questions into executable SQL, often leveraging large language models and schema-aware prompting strategies (e.g., Think2SQL[6], Prompt LLMs Text2SQL[9]). Schema and Knowledge Structure Learning emphasizes extracting and representing relational semantics, including schema linking and knowledge graph construction (e.g., Zero-shot Knowledge Graph[3], CONSchema[26]). Database Optimization and Performance Prediction targets cost estimation and query planning under novel workloads (e.g., CardBench[2], QORA[21]). Cross-Domain and Cross-Task Transfer Mechanisms explore foundational techniques for adapting learned representations across different relational settings, while Specialized Applications and Domain-Specific Transfer apply these ideas to particular verticals such as geospatial data or event schemas. Recent work highlights a tension between end-to-end neural approaches and modular pipelines that decompose schema understanding from query generation. Many studies pursue few-shot or prompt-based methods to handle unseen schemas (AMAZe[1], Small Large NL2SQL[12]), yet foundational transfer mechanisms remain critical for truly zero-shot scenarios. Relational Transformer[0] sits within Cross-Domain and Cross-Task Transfer Mechanisms, specifically under Relational Transfer Learning Foundations, emphasizing architectural designs that encode relational structure for broad generalization. It contrasts with earlier symbolic methods like Relational Domains Transfer[24] and representation-focused work such as Abstract Relational Features[34], offering a neural pathway that bridges schema-agnostic pretraining with downstream database tasks. This positioning reflects ongoing efforts to unify language understanding with relational reasoning, a central challenge as databases grow more heterogeneous and query interfaces more conversational.

Claimed Contributions

Relational Transformer (RT) architecture for relational databases

10 retrieved papers

The authors introduce a novel transformer architecture designed specifically for relational databases that operates at the cell level, enabling pretraining on diverse databases and zero-shot transfer to new datasets and tasks without fine-tuning or retrieval of in-context examples.

10 retrieved papers

Relational Attention mechanism

10 retrieved papers

The authors develop a specialized attention mechanism comprising column attention, feature attention, and neighbor attention layers that explicitly model dependencies across cells, rows, and tables by leveraging the relational structure of databases.

10 retrieved papers

Cell-level tokenization with task table integration

6 retrieved papers

The authors propose representing each database cell as a token with embeddings from its value, column name, and table name, combined with task table integration that augments the database with task-specific context, enabling all downstream tasks to be cast as masked token prediction.

6 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[24] Transfer Learning from Minimal Target Data by Mapping across Relational Domains. PDF

L Mihalkova, RJ Mooney (2009)

[34] Transfer Learning with Abstract Relational Features PDF

Y Bayeva (2011)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Relational Transformer (RT) architecture for relational databases

[1] AMAZe: A Multi-Agent Zero-shot Index Advisor for Relational Databases PDF

Cannot Refute

[53] Zero-shot Temporal Relation Extraction with ChatGPT PDF

Cannot Refute

[54] GLiREL--Generalist Model for Zero-Shot Relation Extraction PDF

Cannot Refute

[55] Quantizing text-attributed graphs for semantic-structural integration PDF

Cannot Refute

[56] Zero shot health trajectory prediction using transformer PDF

Cannot Refute

[57] CrashSage: A Large Language Model-Centered Framework for Contextual and Interpretable Traffic Crash Analysis PDF

Cannot Refute

[58] AnyGraph: Graph Foundation Model in the Wild PDF

Cannot Refute

[59] Zero-Shot Knowledge Extraction with Hierarchical Attention and an Entity-Relationship Transformer PDF

Cannot Refute

[60] Revisiting Large Language Models as Zero-shot Relation Extractors PDF

Cannot Refute

[61] A Zero-Shot Framework for Low-Resource Relation Extraction via Distant Supervision and Large Language Models PDF

Cannot Refute

Contribution

Relational Attention mechanism

[43] Entity structure within and throughout: Modeling mention dependencies for document-level relation extraction PDF

Cannot Refute

[44] TransTab: Learning Transferable Tabular Transformers Across Tables PDF

Cannot Refute

[45] Multi-modal attention based on 2d structured sequence for table recognition PDF

Cannot Refute

[46] Retrieval-augmented forecasting with tabular time series data PDF

Cannot Refute

[47] Tuta: Tree-based transformers for generally structured table pre-training PDF

Cannot Refute

[48] Relational Graph Transformer PDF

Cannot Refute

[49] Joint entityârelation extraction for natural disaster based on table filling PDF

Cannot Refute

[50] MATE: Multi-view Attention for Table Transformer Efficiency PDF

Cannot Refute

[51] TSRDet: A Table Structure Recognition Method Based on Row-Column Detection PDF

Cannot Refute

[52] Improving the Automated Diagnosis of Breast Cancer with Mesh Reconstruction of Ultrasound Images Incorporating 3D Mesh Features and a Graph Attention Network PDF

Cannot Refute

Contribution

Cell-level tokenization with task table integration

[37] Scaling large language models for next-generation single-cell analysis PDF

Cannot Refute

[38] Advancing bioinformatics with large language models: components, applications and perspectives PDF

Cannot Refute

[39] OKR-Cell: Open World Knowledge Aided Single-Cell Foundation Model with Robust Cross-Modal Cell-Language Pre-training PDF

Cannot Refute

[40] An Exploration Toward Immune Foundation Models: Transfer Learning from COVID-19 and Metadata-Driven Representation Learning PDF

Cannot Refute

[41] Stack: In-Context Learning of Single-Cell Biology PDF

Cannot Refute

[42] scPRINT-2: Towards the next-generation of cell foundation models and benchmarks PDF

Cannot Refute

Relational Transformer: Toward Zero-Shot Foundation Models for Relational Data

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[24] Transfer Learning from Minimal Target Data by Mapping across Relational Domains. PDF

[34] Transfer Learning with Abstract Relational Features PDF

Contribution Analysis

Relational Transformer (RT) architecture for relational databases

[1] AMAZe: A Multi-Agent Zero-shot Index Advisor for Relational Databases PDF

[53] Zero-shot Temporal Relation Extraction with ChatGPT PDF

[54] GLiREL--Generalist Model for Zero-Shot Relation Extraction PDF

[55] Quantizing text-attributed graphs for semantic-structural integration PDF

[56] Zero shot health trajectory prediction using transformer PDF

[57] CrashSage: A Large Language Model-Centered Framework for Contextual and Interpretable Traffic Crash Analysis PDF

[58] AnyGraph: Graph Foundation Model in the Wild PDF

[59] Zero-Shot Knowledge Extraction with Hierarchical Attention and an Entity-Relationship Transformer PDF

[60] Revisiting Large Language Models as Zero-shot Relation Extractors PDF

[61] A Zero-Shot Framework for Low-Resource Relation Extraction via Distant Supervision and Large Language Models PDF

Relational Attention mechanism

[43] Entity structure within and throughout: Modeling mention dependencies for document-level relation extraction PDF

[44] TransTab: Learning Transferable Tabular Transformers Across Tables PDF

[45] Multi-modal attention based on 2d structured sequence for table recognition PDF

[46] Retrieval-augmented forecasting with tabular time series data PDF

[47] Tuta: Tree-based transformers for generally structured table pre-training PDF

[48] Relational Graph Transformer PDF

[49] Joint entityârelation extraction for natural disaster based on table filling PDF

[50] MATE: Multi-view Attention for Table Transformer Efficiency PDF

[51] TSRDet: A Table Structure Recognition Method Based on Row-Column Detection PDF

[52] Improving the Automated Diagnosis of Breast Cancer with Mesh Reconstruction of Ultrasound Images Incorporating 3D Mesh Features and a Graph Attention Network PDF

Cell-level tokenization with task table integration

[37] Scaling large language models for next-generation single-cell analysis PDF

[38] Advancing bioinformatics with large language models: components, applications and perspectives PDF

[39] OKR-Cell: Open World Knowledge Aided Single-Cell Foundation Model with Robust Cross-Modal Cell-Language Pre-training PDF

[40] An Exploration Toward Immune Foundation Models: Transfer Learning from COVID-19 and Metadata-Driven Representation Learning PDF

[41] Stack: In-Context Learning of Single-Cell Biology PDF

[42] scPRINT-2: Towards the next-generation of cell foundation models and benchmarks PDF

Table of Contents

[49] Joint entityârelation extraction for natural disaster based on table filling PDF