Towards All-Atom Foundation Models for Biomolecular Binding Affinity Prediction

ICLR 2026 Conference SubmissionAnonymous Authors
Biology foundation modelbiomolecular interaction predictionrepresentation learning
Abstract:

Biomolecular interactions play a critical role in biological processes. While recent breakthroughs like AlphaFold 3 have enabled accurate modeling of biomolecular complex structures, predicting binding affinity remains challenging mainly due to limited high-quality data. Recent methods are often specialized for specific types of biomolecular interactions, limiting their generalizability. In this work, we repurpose AlphaFold 3 for representation learning to predict binding affinity, a non-trivial task that requires shifting from generative structure prediction to encoding observed geometry, simplifying the heavily conditioned trunk module, and designing a framework to jointly capture sequence and structural information. To address these challenges, we introduce the Atom-level Diffusion Transformer (ADiT), which takes sequence and structure as inputs, employs a unified tokenization scheme, integrates diffusion transformers, and removes dependencies on multiple sequence alignments and templates. We pre-train three ADiT variants on the PDB dataset with a denoising objective and evaluate them across protein-ligand, drug-target, protein-protein, and antibody-antigen interactions. The model achieves state-of-the-art or competitive performance across benchmarks, scales effectively with model size, and successfully identifies wet-lab validated affinity-enhancing antibody mutations, establishing a generalizable framework for biomolecular interactions. We plan to release the code upon acceptance.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces ADiT, an atom-level diffusion transformer that repurposes AlphaFold 3's architecture for binding affinity prediction across multiple biomolecular interaction types. It resides in the 'Diffusion Transformer and Foundation Models' leaf, which contains only two papers in the entire 50-paper taxonomy. This sparse population suggests the work occupies an emerging research direction where large-scale foundation models are being adapted from generative structure prediction to affinity estimation tasks, rather than the more crowded graph neural network or convolutional branches.

The taxonomy reveals that most structure-based deep learning methods cluster in graph neural network subcategories (interaction-aware, multi-scale, contrastive) and convolutional approaches, which together account for roughly a dozen papers. Transformer and attention-based architectures form a smaller adjacent branch with two papers, while sequence-based and hybrid feature methods constitute another major direction. ADiT diverges from these by combining diffusion processes with transformer blocks at atomic resolution, positioning itself closer to generative modeling paradigms than to task-specific graph or grid-based encoders.

Among 26 candidates examined, the pre-training and fine-tuning framework (Contribution 2) encountered two refutable candidates, indicating that denoising objectives on PDB data have precedent in the limited search scope. The core ADiT architecture (Contribution 1) and the AlphaFold 3 adaptation strategy (Contribution 3) each examined 10 candidates with zero refutations, suggesting these specific design choices—unified tokenization, removal of MSA dependencies, and the shift from generative to representation learning—appear less directly overlapping with prior work in the top-26 semantic matches.

Based on this limited search of 26 candidates, the work appears to introduce novel architectural adaptations in a sparsely populated research area, though the pre-training strategy shows some overlap with existing foundation model efforts. The analysis does not cover the full literature landscape, and a broader search might reveal additional related work in generative modeling or protein language model domains.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
26
Contribution Candidate Papers Compared
2
Refutable Paper

Research Landscape Overview

Core task: biomolecular binding affinity prediction. The field has evolved from classical physics-based computational methods and mathematical descriptors toward a rich ecosystem of machine learning approaches. At the top level, the taxonomy distinguishes deep learning architectures for structure-based prediction, sequence-based and hybrid feature methods, specialized learning strategies, mathematical and topological descriptors, classical computational techniques, and domain-specific interaction contexts (e.g., protein–protein, protein–ligand, or nucleic acid binding). Deep learning architectures have become particularly prominent, with branches exploring graph neural networks, convolutional models, attention mechanisms, and more recently diffusion-based and foundation model paradigms. Sequence-based methods often leverage pre-trained language models or hybrid encodings that combine sequence and structural information, while specialized learning strategies address challenges such as limited data, multi-task learning, and contrastive objectives. Mathematical and topological approaches (e.g., persistent homology) offer interpretable geometric features, and classical methods remain relevant for benchmarking and physics-informed priors. Recent years have seen growing interest in large-scale pre-training and foundation models that can generalize across diverse biomolecular contexts. All-Atom Foundation Models[0] exemplifies this trend by learning representations at the atomic level, aiming for broad applicability to various binding prediction tasks. This work sits within the diffusion transformer and foundation model branch, closely related to efforts like Boltz-2[1], which also explores generative and predictive modeling of biomolecular structures. In contrast, many other deep learning methods focus on task-specific architectures—such as graph convolutions for protein–ligand complexes (e.g., MGraphDTA[9], Structure-Aware Interactive[10]) or attention-based fusion strategies (DeepDTAF[4], Deep Fusion Inference[11])—that excel in narrower settings but may require retraining for new interaction types. The shift toward foundation models reflects an ambition to unify disparate prediction tasks under a single learned prior, though open questions remain about how well such models capture fine-grained energetic details compared to specialized or physics-based approaches.

Claimed Contributions

Atom-level Diffusion Transformer (ADiT) for biomolecular binding affinity prediction

The authors propose ADiT, a unified foundation model that accepts both sequence and structure inputs for diverse biomolecular interactions. The model uses a unified tokenization scheme for proteins and molecules, incorporates diffusion transformers for multi-level representation learning, and eliminates the need for MSAs and templates unlike AlphaFold 3.

10 retrieved papers
Pre-training and fine-tuning framework with denoising objective on PDB dataset

The authors develop a two-stage training approach where ADiT models are first pre-trained on large-scale PDB data using a denoising self-supervised objective, then fine-tuned for downstream binding affinity prediction tasks across multiple interaction types.

6 retrieved papers
Can Refute
Adaptation strategy for converting AlphaFold 3 from generative to representation learning

The authors present a non-trivial adaptation strategy that transforms AlphaFold 3's generative architecture into a representation learner by simplifying the conditioning module, leveraging the transformer-based atom and sequence-level architecture, and using large-scale pre-training to address data scarcity in affinity prediction.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Atom-level Diffusion Transformer (ADiT) for biomolecular binding affinity prediction

The authors propose ADiT, a unified foundation model that accepts both sequence and structure inputs for diverse biomolecular interactions. The model uses a unified tokenization scheme for proteins and molecules, incorporates diffusion transformers for multi-level representation learning, and eliminates the need for MSAs and templates unlike AlphaFold 3.

Contribution

Pre-training and fine-tuning framework with denoising objective on PDB dataset

The authors develop a two-stage training approach where ADiT models are first pre-trained on large-scale PDB data using a denoising self-supervised objective, then fine-tuned for downstream binding affinity prediction tasks across multiple interaction types.

Contribution

Adaptation strategy for converting AlphaFold 3 from generative to representation learning

The authors present a non-trivial adaptation strategy that transforms AlphaFold 3's generative architecture into a representation learner by simplifying the conditioning module, leveraging the transformer-based atom and sequence-level architecture, and using large-scale pre-training to address data scarcity in affinity prediction.