Scaling Atomistic Protein Binder Design with Generative Pretraining and Test-Time Compute

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 7.0 Download Report PDF

binder designprotein designflow matchinghallucinationinference-time scalinggenerative modelingdiffusion models

Protein interaction modeling is central to protein design, which has been transformed by machine learning with broad applications in drug discovery and beyond. In this landscape, structure-based de novo binder design is most often cast as either conditional generative modeling or sequence optimization via structure predictors ("hallucination"). We argue that this is a false dichotomy and propose Complexa, a novel fully atomistic binder generation method unifying both paradigms. We extend recent flow-based latent protein generation architecture and leverage the domain-domain interactions of monomeric computationally predicted protein structures to construct Teddymer, a new large-scale dataset of synthetic binder-target pairs for pretraining. Combined with high-quality experimental multimers, this enables training a strong base model. We then perform inference-time optimization with this generative prior, unifying the strengths of previously distinct generative and hallucination methods. Complexa sets a new state of the art in computational binder design benchmarks: it delivers markedly higher in-silico success rates than existing generative approaches, and our novel test-time optimization strategies greatly outperform previous hallucination methods under normalized compute budgets. We further demonstrate explicit interface hydrogen bond optimization, fold class-guided binder generation, and extensions to small molecule targets and enzyme design tasks, again surpassing prior methods. Code, models and new data will be publicly released.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes Complexa, a method unifying generative modeling and hallucination-based optimization for protein binder design. It sits within the 'Diffusion and Flow-Based Generative Models' leaf, which contains only three papers total, indicating a relatively focused but not overcrowded research direction. The taxonomy shows this leaf is one of four under 'Generative Model Architectures and Training Paradigms', suggesting the field has diversified into multiple architectural paradigms rather than concentrating heavily in any single approach.

The taxonomy reveals neighboring leaves include 'Protein Language Models for Binder Generation' (two papers), 'AlphaFold-Based Hallucination and Inversion' (four papers), and 'Hybrid and Multi-Scale Frameworks' (two papers). Complexa's claim to unify generative and hallucination paradigms positions it at the boundary between the diffusion-based leaf and the AlphaFold hallucination branch. The 'Inference-Time Optimization and Filtering Strategies' branch (three papers across two leaves) is also relevant, as Complexa incorporates test-time optimization. This cross-cutting positioning suggests the work bridges previously distinct methodological clusters.

Among thirty candidates examined, the analysis identified one refutable pair for the Teddymer dataset contribution, while the unification framework and Complexa architecture showed no clear refutations across ten candidates each. The Teddymer dataset—synthetic binder-target pairs from domain-domain interactions—appears to have some overlap with prior synthetic training data efforts. The core methodological contributions (unifying generative and hallucination, latent target conditioning with test-time optimization) appear more distinctive within the limited search scope, though the analysis does not claim exhaustive coverage of all relevant prior work.

Based on the top-thirty semantic matches and taxonomy structure, Complexa appears to occupy a relatively novel position by explicitly bridging generative and hallucination paradigms. The limited search scope means potentially relevant work outside these candidates may exist. The taxonomy's modest leaf size (three papers) and the cross-branch positioning suggest the work addresses a recognized gap, though the Teddymer dataset shows some precedent in synthetic training data construction.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: de novo protein binder design. The field has matured from early computational and experimental efforts into a rich ecosystem organized around several complementary strategies. At the highest level, the taxonomy distinguishes generative model architectures—including diffusion and flow-based frameworks that learn to sample novel binder structures—from inference-time optimization methods that refine candidates through energy-based filtering or iterative search. Parallel branches address target-specific applications (designing binders for particular therapeutic or diagnostic targets), computational pipelines that automate multi-step workflows, and specialized modalities such as antibody-specific methods, template-based approaches, and small-molecule binding sites. Additional branches capture experimental screening platforms and broader reviews that synthesize methodological advances. Representative works like RFdiffusion[7] exemplify the power of diffusion models, while AlphaProteo[2] and related efforts demonstrate how large-scale training can yield high-affinity binders across diverse targets. Within the generative model branch, diffusion and flow-based methods have become particularly active, balancing the need for structural realism with computational efficiency. RFdiffusion[7] pioneered the application of denoising diffusion to protein backbone generation, enabling flexible hotspot-constrained design, whereas Flow Matching Design[23] explores continuous normalizing flows for smoother sampling trajectories. Scaling Atomistic Binder[0] sits squarely in this diffusion-and-flow cluster, emphasizing atomistic resolution and scalable training regimes that push beyond earlier coarse-grained or backbone-only models. Compared to RFdiffusion[7], which focuses on backbone geometry, Scaling Atomistic Binder[0] incorporates finer chemical detail to improve predicted binding interfaces. In contrast to Flow Matching Design[23], which prioritizes flow-matching dynamics, Scaling Atomistic Binder[0] leverages diffusion with explicit atom-level representations. These distinctions reflect ongoing exploration of how best to encode physical constraints and sampling efficiency in generative frameworks, a central question as the field moves toward clinically validated therapeutics and high-throughput experimental validation.

Claimed Contributions

Unifying generative modeling and hallucination methods for binder design

10 retrieved papers

The authors introduce a unified framework that integrates flow-based generative modeling with inference-time optimization, bridging the gap between conditional generation approaches and structure predictor-based hallucination methods that were previously treated as separate paradigms in protein binder design.

10 retrieved papers

Teddymer dataset of synthetic protein dimers

Can Refute

10 retrieved papers

The authors construct a large-scale dataset of 3.5 million synthetic binder-target pairs by partitioning AlphaFold Database monomers into structural domains using TED annotations and assembling artificial dimers from domain-domain interactions, providing training data for atomistic binder generation.

10 retrieved papers

Can Refute

Complexa framework with latent target conditioning and test-time optimization

10 retrieved papers

The authors develop Complexa, a fully atomistic binder generation method that extends La-Proteína's partially latent flow matching architecture with novel latent target conditioning and implements multiple test-time compute scaling algorithms (beam search, Feynman-Kac steering, Monte Carlo Tree Search) to optimize binder quality during inference.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[7] De novo design of protein structure and function with RFdiffusion PDF

Joseph L. Watson, David Juergens, N. Bennett, Brian L. Trippe, Jason Yim, Helen E. Eisenach, Woody Ahern, Andrew J. Borst, Robert J. Ragotte, L. Milles, B. Wicky, Nikita Hanikel, S. Pellock, A. Courbet, W. Sheffler, Jue Wang, Preetham Venkatesh, Isaac Sappington, Susana VÃ¡zquez Torres, Anna Lauko, Valentin De Bortoli, Emile Mathieu, Sergey Ovchinnikov, R. Barzilay, T. Jaakkola, Frank Dimaio, M. Baek, David Baker, Nathaniel R. Bennett, Andrew J Borst, Lukas F. Milles, Basile I. M. Wicky, Samuel J. Pellock, Alexis Courbet, William Sheffler, Regina Barzilay, Tommi S. Jaakkola, Minkyung Baek (2023) • Nature

[23] Robust and Reliable de novo Protein Design: A Flow-Matching-Based Protein Generative Model Achieves Remarkably High Success Rates PDF

Junyu Yan, Zibo Cui, Wenqing Yan, Yuhang Chen, Mengchen Pu, Shuai Li, Sheng Ye (2025) • bioRxiv

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Unifying generative modeling and hallucination methods for binder design

[13] Pxdesign: Fast, modular, and accurate de novo design of protein binders PDF

Cannot Refute

[51] Scaffolding protein functional sites using deep learning PDF

Cannot Refute

[52] Data and AI-driven synthetic binding protein discovery PDF

Cannot Refute

[53] Fold-Conditioned De Novo Binder Design via AlphaFold2-Multimer Hallucination PDF

Cannot Refute

[54] Protein Hunter: exploiting structure hallucination within diffusion for protein design PDF

Cannot Refute

[55] BindEnergyCraft: Casting Protein Structure Predictors as Energy-Based Models for Binder Design PDF

Cannot Refute

[56] Design of proteins presenting discontinuous functional sites using deep learning PDF

Cannot Refute

[57] Diï¬usion Models for Protein Structure Design: From Backbone Generation to Atomic-Resolution Enzyme Design PDF

Cannot Refute

[58] Casting Protein Structure Predictors as Energy-BasedModels for Binder Design and Scoring PDF

Cannot Refute

[59] HalluDesign: Protein Optimization and de novo Design via Iterative Structure Hallucination and Sequence design PDF

Cannot Refute

Contribution

Teddymer dataset of synthetic protein dimers

[60] ProBID-Net: a deep learning model for proteinâprotein binding interface design PDF

Can Refute

[42] Target-Specific De Novo Peptide Binder Design with DiffPepBuilder PDF

Cannot Refute

[61] Rational Design and Protein Engineering of {SH2 Domainââ« Flexible Linkerââ« SelfâControlling Peptide} Fusion System With PhosphorylationâRegulated Molecular Switch Functionality PDF

Cannot Refute

[62] Synthetic protein switches: design principles and applications PDF

Cannot Refute

[63] Protein domain mimics as modulators of proteinâprotein interactions PDF

Cannot Refute

[64] Design of protein function leaps by directed domain interface evolution PDF

Cannot Refute

[65] An All-Atom Generative Model for Designing Protein Complexes PDF

Cannot Refute

[66] Nonnatural proteinâprotein interaction-pair design by key residues grafting PDF

Cannot Refute

[67] Development of artificial antibody against receptor binding domain of SARS-CoV-2 spike protein. PDF

Cannot Refute

[68] Exploring Artificially Conjugated Ubiquitin Dimers by Means of NMR Spectroscopy and MD Simulations PDF

Cannot Refute

Contribution

Complexa framework with latent target conditioning and test-time optimization

[69] Reinforcement Learning-Inspired Molecular Generation with Latent Space Diffusion and Genetic Algorithm Optimization under Affinity and Similarity Constraints PDF

Cannot Refute

[70] Manipulating 3D Molecules in a Fixed-Dimensional SE(3)-Equivariant Latent Space PDF

Cannot Refute

[71] Molecule design by latent prompt transformer PDF

Cannot Refute

[72] Target specific peptide design using latent space approximate trajectory collector PDF

Cannot Refute

[73] Electron density-based GPT for optimization and suggestion of hostâguest binders PDF

Cannot Refute

[74] Variational autoencoder-based chemical latent space for large molecular structures with 3D complexity PDF

Cannot Refute

[75] Deep Generative Methods for Target Specific Drug Design PDF

Cannot Refute

[76] Latent molecular optimization for targeted therapeutic design PDF

Cannot Refute

[77] Protein Diffusion Models as Statistical Potentials PDF

Cannot Refute

[78] Bayesian Optimization in the Latent Space of a Variational Autoencoder for the Generation of Selective FLT3 Inhibitors. PDF

Cannot Refute

Scaling Atomistic Protein Binder Design with Generative Pretraining and Test-Time Compute

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[7] De novo design of protein structure and function with RFdiffusion PDF

[23] Robust and Reliable de novo Protein Design: A Flow-Matching-Based Protein Generative Model Achieves Remarkably High Success Rates PDF

Contribution Analysis

Unifying generative modeling and hallucination methods for binder design

[13] Pxdesign: Fast, modular, and accurate de novo design of protein binders PDF

[51] Scaffolding protein functional sites using deep learning PDF

[52] Data and AI-driven synthetic binding protein discovery PDF

[53] Fold-Conditioned De Novo Binder Design via AlphaFold2-Multimer Hallucination PDF

[54] Protein Hunter: exploiting structure hallucination within diffusion for protein design PDF

[55] BindEnergyCraft: Casting Protein Structure Predictors as Energy-Based Models for Binder Design PDF

[56] Design of proteins presenting discontinuous functional sites using deep learning PDF

[57] Diï¬usion Models for Protein Structure Design: From Backbone Generation to Atomic-Resolution Enzyme Design PDF

[58] Casting Protein Structure Predictors as Energy-BasedModels for Binder Design and Scoring PDF

[59] HalluDesign: Protein Optimization and de novo Design via Iterative Structure Hallucination and Sequence design PDF

Teddymer dataset of synthetic protein dimers

[60] ProBID-Net: a deep learning model for proteinâprotein binding interface design PDF

[42] Target-Specific De Novo Peptide Binder Design with DiffPepBuilder PDF

[61] Rational Design and Protein Engineering of {SH2 Domainââ« Flexible Linkerââ« SelfâControlling Peptide} Fusion System With PhosphorylationâRegulated Molecular Switch Functionality PDF

[62] Synthetic protein switches: design principles and applications PDF

[63] Protein domain mimics as modulators of proteinâprotein interactions PDF

[64] Design of protein function leaps by directed domain interface evolution PDF

[65] An All-Atom Generative Model for Designing Protein Complexes PDF

[66] Nonnatural proteinâprotein interaction-pair design by key residues grafting PDF

[67] Development of artificial antibody against receptor binding domain of SARS-CoV-2 spike protein. PDF

[68] Exploring Artificially Conjugated Ubiquitin Dimers by Means of NMR Spectroscopy and MD Simulations PDF

Complexa framework with latent target conditioning and test-time optimization

[69] Reinforcement Learning-Inspired Molecular Generation with Latent Space Diffusion and Genetic Algorithm Optimization under Affinity and Similarity Constraints PDF

[70] Manipulating 3D Molecules in a Fixed-Dimensional SE(3)-Equivariant Latent Space PDF

[71] Molecule design by latent prompt transformer PDF

[72] Target specific peptide design using latent space approximate trajectory collector PDF

[73] Electron density-based GPT for optimization and suggestion of hostâguest binders PDF

[74] Variational autoencoder-based chemical latent space for large molecular structures with 3D complexity PDF

[75] Deep Generative Methods for Target Specific Drug Design PDF

[76] Latent molecular optimization for targeted therapeutic design PDF

[77] Protein Diffusion Models as Statistical Potentials PDF

[78] Bayesian Optimization in the Latent Space of a Variational Autoencoder for the Generation of Selective FLT3 Inhibitors. PDF

Table of Contents

[57] Diï¬usion Models for Protein Structure Design: From Backbone Generation to Atomic-Resolution Enzyme Design PDF

[60] ProBID-Net: a deep learning model for proteinâprotein binding interface design PDF

[61] Rational Design and Protein Engineering of {SH2 Domainââ« Flexible Linkerââ« SelfâControlling Peptide} Fusion System With PhosphorylationâRegulated Molecular Switch Functionality PDF

[63] Protein domain mimics as modulators of proteinâprotein interactions PDF

[66] Nonnatural proteinâprotein interaction-pair design by key residues grafting PDF

[73] Electron density-based GPT for optimization and suggestion of hostâguest binders PDF