Property-Driven Protein Inverse Folding with Multi-Objective Preference Alignment

ICLR 2026 Conference SubmissionAnonymous Authors
protein designpreference alignment
Abstract:

Protein sequence design must balance designability, defined as the ability to recover a target backbone, with multiple, often competing, developability properties such as solubility, thermostability, and expression. Existing approaches address these properties through post hoc mutation, inference-time biasing, or retraining on property-specific subsets, yet they are target dependent and demand substantial domain expertise or careful hyperparameter tuning. In this paper, we introduce ProtAlign, a multi-objective preference alignment framework that fine-tunes pretrained inverse folding models to satisfy diverse developability objectives while preserving structural fidelity. ProtAlign employs a semi-online Direct Preference Optimization strategy with a flexible preference margin to mitigate conflicts among competing objectives and constructs preference pairs using in silico property predictors. Applied to the widely used ProteinMPNN backbone, the resulting model MoMPNN enhances developability without compromising designability across tasks including sequence design for CATH 4.3 crystal structures, de novo generated backbones, and real-world binder design scenarios, making it an appealing framework for practical protein sequence design.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces ProtAlign, a multi-objective preference alignment framework that fine-tunes inverse folding models to balance designability with developability properties such as solubility and thermostability. It resides in the Multi-Objective Preference Alignment leaf, which contains three papers including the original work. This leaf sits within the broader Preference-Based Optimization Methods branch, indicating a moderately populated research direction focused on aligning generative models with multiple objectives through preference signals rather than post-hoc filtering or single-objective optimization.

The taxonomy reveals neighboring approaches in sibling leaves: Direct Preference Optimization for Designability focuses solely on structural fidelity using confidence scores, while Guided Generation and Sampling Methods employ classifier guidance or MCMC strategies without explicit preference learning. The Multi-Objective Preference Alignment leaf explicitly excludes single-objective methods, positioning this work at the intersection of structural accuracy and practical therapeutic constraints. Related branches like Antibody-Specific Design Pipelines and Developability Prediction provide complementary tools for property assessment, but the preference alignment approach distinguishes itself by integrating objectives during model training rather than relying on external guidance or iterative refinement.

Among seventeen candidates examined, the ProtAlign framework contribution shows one refutable candidate, suggesting some prior work addresses multi-objective preference alignment for protein design. The semi-online Direct Preference Optimization strategy examined ten candidates with none clearly refuting it, indicating potential novelty in the specific training procedure and flexible margin mechanism. The MoMPNN model contribution examined six candidates without clear refutation, though the limited search scope means substantial related work may exist beyond the top semantic matches. The statistics suggest the framework's novelty lies more in its training methodology than in the general concept of multi-objective protein design.

Based on the limited literature search covering seventeen candidates, the work appears to occupy a moderately explored niche within preference-based protein design. The taxonomy structure shows this is an active area with established sibling methods, but the specific combination of semi-online DPO and flexible margins may offer incremental advances. The analysis does not cover exhaustive prior work in reinforcement learning for proteins or broader multi-objective optimization literature, leaving open questions about how this approach compares to methods outside the semantic search scope.

Taxonomy

Core-task Taxonomy Papers
35
3
Claimed Contributions
17
Contribution Candidate Papers Compared
1
Refutable Paper

Research Landscape Overview

Core task: Multi-objective protein sequence design balancing designability and developability properties. The field has organized itself around several complementary strategies for navigating the tension between generating structurally sound proteins and ensuring they meet practical therapeutic or functional criteria. Preference-Based Optimization Methods focus on aligning generative models with multiple objectives through techniques like reinforcement learning and preference learning, enabling direct trade-off management during sequence generation. Guided Generation and Sampling Methods steer diffusion models or other generative processes toward desired property profiles without full retraining, while Antibody-Specific Design Pipelines address the unique constraints of therapeutic antibody development. Generative Models for De Novo Protein Design explore foundational architectures for creating novel sequences, and Developability Prediction and Assessment provide the predictive tools needed to evaluate manufacturability and stability. Foundations and Principles of Protein Design and Specialized Design Applications round out the taxonomy by covering theoretical underpinnings and domain-specific challenges such as enzyme design or binder engineering. A particularly active area involves reconciling structural designability with developability constraints, where works like Designability Preference Optimization[2] and Multi-objective Antibody Design[1] demonstrate how preference alignment can simultaneously optimize binding affinity, stability, and manufacturability. Property-Driven Inverse Folding[0] sits within this preference-based optimization cluster, emphasizing the integration of multiple property objectives directly into the inverse folding process rather than treating them as post-hoc filters. This contrasts with approaches like Multi-objective Binder Design[26], which may rely more heavily on guided sampling or iterative refinement. Nearby methods such as Guided Discrete Diffusion[3] illustrate alternative strategies that guide generative processes without explicit preference models, highlighting an ongoing tension between end-to-end optimization and modular design pipelines. The central challenge remains how to efficiently explore the high-dimensional space of sequences that satisfy both biophysical plausibility and practical therapeutic requirements.

Claimed Contributions

ProtAlign multi-objective preference alignment framework

The authors propose ProtAlign, a framework that aligns pretrained inverse folding models with both designability and multiple developability properties (such as solubility, thermostability, and expression) without requiring target-dependent hyperparameter tuning or domain expertise.

1 retrieved paper
Can Refute
Semi-online Direct Preference Optimization with flexible preference margin

The authors develop a novel semi-online DPO algorithm that uses an adaptive preference margin to balance competing developability objectives while maintaining sequence-structure fidelity during optimization.

10 retrieved papers
MoMPNN model for property-driven protein design

The authors present MoMPNN, a model created by applying ProtAlign to ProteinMPNN, which improves developability properties while maintaining designability across various protein design tasks including crystal structures, de novo backbones, and binder design.

6 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

ProtAlign multi-objective preference alignment framework

The authors propose ProtAlign, a framework that aligns pretrained inverse folding models with both designability and multiple developability properties (such as solubility, thermostability, and expression) without requiring target-dependent hyperparameter tuning or domain expertise.

Contribution

Semi-online Direct Preference Optimization with flexible preference margin

The authors develop a novel semi-online DPO algorithm that uses an adaptive preference margin to balance competing developability objectives while maintaining sequence-structure fidelity during optimization.

Contribution

MoMPNN model for property-driven protein design

The authors present MoMPNN, a model created by applying ProtAlign to ProteinMPNN, which improves developability properties while maintaining designability across various protein design tasks including crystal structures, de novo backbones, and binder design.