Towards All-Atom Foundation Models for Biomolecular Binding Affinity Prediction
Overview
Overall Novelty Assessment
The paper introduces ADiT, an atom-level diffusion transformer that repurposes AlphaFold 3's architecture for binding affinity prediction across multiple biomolecular interaction types. It resides in the 'Diffusion Transformer and Foundation Models' leaf, which contains only two papers in the entire 50-paper taxonomy. This sparse population suggests the work occupies an emerging research direction where large-scale foundation models are being adapted from generative structure prediction to affinity estimation tasks, rather than the more crowded graph neural network or convolutional branches.
The taxonomy reveals that most structure-based deep learning methods cluster in graph neural network subcategories (interaction-aware, multi-scale, contrastive) and convolutional approaches, which together account for roughly a dozen papers. Transformer and attention-based architectures form a smaller adjacent branch with two papers, while sequence-based and hybrid feature methods constitute another major direction. ADiT diverges from these by combining diffusion processes with transformer blocks at atomic resolution, positioning itself closer to generative modeling paradigms than to task-specific graph or grid-based encoders.
Among 26 candidates examined, the pre-training and fine-tuning framework (Contribution 2) encountered two refutable candidates, indicating that denoising objectives on PDB data have precedent in the limited search scope. The core ADiT architecture (Contribution 1) and the AlphaFold 3 adaptation strategy (Contribution 3) each examined 10 candidates with zero refutations, suggesting these specific design choices—unified tokenization, removal of MSA dependencies, and the shift from generative to representation learning—appear less directly overlapping with prior work in the top-26 semantic matches.
Based on this limited search of 26 candidates, the work appears to introduce novel architectural adaptations in a sparsely populated research area, though the pre-training strategy shows some overlap with existing foundation model efforts. The analysis does not cover the full literature landscape, and a broader search might reveal additional related work in generative modeling or protein language model domains.
Taxonomy
Research Landscape Overview
Claimed Contributions
The authors propose ADiT, a unified foundation model that accepts both sequence and structure inputs for diverse biomolecular interactions. The model uses a unified tokenization scheme for proteins and molecules, incorporates diffusion transformers for multi-level representation learning, and eliminates the need for MSAs and templates unlike AlphaFold 3.
The authors develop a two-stage training approach where ADiT models are first pre-trained on large-scale PDB data using a denoising self-supervised objective, then fine-tuned for downstream binding affinity prediction tasks across multiple interaction types.
The authors present a non-trivial adaptation strategy that transforms AlphaFold 3's generative architecture into a representation learner by simplifying the conditioning module, leveraging the transformer-based atom and sequence-level architecture, and using large-scale pre-training to address data scarcity in affinity prediction.
Core Task Comparisons
Comparisons with papers in the same taxonomy category
[1] Boltz-2: Towards Accurate and Efficient Binding Affinity Prediction PDF
Contribution Analysis
Detailed comparisons for each claimed contribution
Atom-level Diffusion Transformer (ADiT) for biomolecular binding affinity prediction
The authors propose ADiT, a unified foundation model that accepts both sequence and structure inputs for diverse biomolecular interactions. The model uses a unified tokenization scheme for proteins and molecules, incorporates diffusion transformers for multi-level representation learning, and eliminates the need for MSAs and templates unlike AlphaFold 3.
[53] Diffbp: Generative diffusion of 3d molecules for target protein binding PDF
[66] A unified conditional diffusion framework for dual protein targets-based bioactive molecule generation PDF
[67] AptaDiff: de novo design and optimization of aptamers based on diffusion models PDF
[68] ProteinReDiff: Complex-based ligand-binding proteins redesign by equivariant diffusion-based generative models PDF
[69] DTITR: End-to-end drug-target binding affinity prediction with transformers PDF
[70] MolSculptor: an adaptive diffusion-evolution framework enabling generative drug design for multi-target affinity and selectivity PDF
[71] ProtT-Affinity: Sequence-Based Protein-Protein Binding Affinity Prediction Using ProtT5 Embeddings PDF
[72] High Performance Binding Affinity Prediction with a Transformer-Based Surrogate Model PDF
[73] General Binding Affinity Guidance for Diffusion Models in Structure-Based Drug Design PDF
[74] Generative Models in Protein Engineering: A Comprehensive Survey PDF
Pre-training and fine-tuning framework with denoising objective on PDB dataset
The authors develop a two-stage training approach where ADiT models are first pre-trained on large-scale PDB data using a denoising self-supervised objective, then fine-tuned for downstream binding affinity prediction tasks across multiple interaction types.
[62] SE (3) denoising score matching for unsupervised binding energy prediction and nanobody design PDF
[65] Pre-training Protein Models with Molecular Dynamics Simulations for Drug Binding PDF
[37] Harnessing pre-trained models for accurate prediction of protein-ligand binding affinity. PDF
[61] Joint design of protein surface and backbone using a diffusion bridge model PDF
[63] Full-Atom Peptide Design with Geometric Latent Diffusion PDF
[64] Protein A-like peptide generation based on generalized diffusion model PDF
Adaptation strategy for converting AlphaFold 3 from generative to representation learning
The authors present a non-trivial adaptation strategy that transforms AlphaFold 3's generative architecture into a representation learner by simplifying the conditioning module, leveraging the transformer-based atom and sequence-level architecture, and using large-scale pre-training to address data scarcity in affinity prediction.