Fast and Interpretable Protein Substructure Alignment via Optimal Transport

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

Protein substructure alignmentResidue-level representationOptimal transportDeep learningStructural bioinformatics

Proteins are essential biological macromolecules that execute life functions. Local motifs within protein structures, such as active sites, are the most critical components for linking structure to function and are key to understanding protein evolution and enabling protein engineering. Existing computational methods struggle to identify and compare these local structures, which leaves a significant gap in understanding protein structures and harnessing their functions. This study presents PLASMA, the first deep learning framework for efficient and interpretable residue-level protein substructure alignment. We reformulate the problem as a regularized optimal transport task and leverage differentiable Sinkhorn iterations. For a pair of input protein structures, PLASMA outputs a clear alignment matrix with an interpretable overall similarity score. Through extensive quantitative evaluations and three biological case studies, we demonstrate that PLASMA achieves accurate, lightweight, and interpretable residue-level alignment. Additionally, we introduce PLASMA-PF, a training-free variant that provides a practical alternative when training data are unavailable. Our method addresses a critical gap in protein structure analysis tools and offers new opportunities for functional annotation, evolutionary studies, and structure-based drug design. Reproducibility is ensured via our official implementation at https://anonymous.4open.science/r/plasma-5A5B/.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

PLASMA introduces a deep learning framework for residue-level protein substructure alignment by reformulating the task as a regularized optimal transport problem with differentiable Sinkhorn iterations. It resides in the 'Deep Learning Alignment Models' leaf under 'Machine Learning and Neural Network Methods', which contains only two papers in the entire taxonomy of fifty. This sparse population suggests the application of deep learning to residue-level local alignment is relatively underexplored compared to geometric or sequence-structure integration methods, which collectively account for over twenty papers across multiple leaves.

The taxonomy reveals that most alignment research concentrates on geometric transformations (Direct Coordinate Superposition, Spatial Indexing) and sequence-structure integration (Profile Threading, Sequence-Guided Alignment), with local substructure detection forming a smaller but distinct cluster. PLASMA's optimal transport formulation diverges from both classical geometric superposition methods and graph-based approaches, instead treating alignment as a distribution-matching problem. Its sibling paper in the same leaf employs graph convolutions and attention mechanisms, indicating that even within the nascent deep learning alignment subfield, methodological diversity is emerging.

Among twenty-four candidates examined, the core PLASMA framework (Contribution A) shows no clear refutation across four candidates, suggesting limited prior work directly addresses optimal transport for local structural alignment. However, the Label Match Loss (Contribution B, ten candidates examined, one refutable) and the training-free variant PLASMA-PF (Contribution C, ten candidates examined, two refutable) encounter more substantial overlap. These statistics indicate that while the overall framework may be novel, specific technical components—handling incomplete annotations and parameter-free alignment—have precedents in the limited literature surveyed.

Based on top-twenty-four semantic matches, PLASMA appears to occupy a relatively unexplored niche at the intersection of optimal transport theory and residue-level alignment. The analysis does not cover exhaustive domain-specific databases or recent preprints, so additional related work may exist beyond this scope. The framework's novelty seems strongest in its core formulation, with incremental aspects in auxiliary training strategies and parameter-free variants.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: residue-level protein substructure alignment. The field encompasses a diverse set of strategies for matching protein fragments at fine granularity, ranging from purely geometric methods that superimpose spatial coordinates to sequence-structure integration approaches that blend evolutionary signals with three-dimensional information. Geometric and spatial alignment methods (e.g., Protein Structure Alignment[1], FAST Algorithm[2]) emphasize rigid-body transformations and distance-based scoring, while local substructure and motif detection techniques (e.g., ProBiS[5]) focus on identifying conserved binding sites or functional patches. Hierarchical and multi-scale alignment strategies (e.g., Hierarchical Superposition[12]) decompose structures into nested levels, and graph and network-based approaches represent residues as nodes to exploit topological relationships. Machine learning and neural network methods have recently emerged as a distinct branch, leveraging deep architectures to learn alignment features directly from data (e.g., SAFoldNet[8], Pocket Pretraining[7]), while optimization and search algorithms (e.g., Genetic Algorithm[42]) tackle the combinatorial challenge of finding optimal correspondences. Multiple structure alignment, specialized scoring measures, and high-throughput search tools address scalability and comparative analysis across large datasets. Within this landscape, a particularly active line of work explores how neural networks can capture complex structural patterns that classical heuristics may miss. Optimal Transport Alignment[0] sits squarely in the machine learning and neural network methods branch, employing optimal transport theory to learn residue-level correspondences in a differentiable framework. This contrasts with earlier deep learning models like SAFoldNet[8], which relies on graph convolutions and attention mechanisms, by framing alignment as a distribution-matching problem rather than a direct feature-embedding task. Meanwhile, geometric methods such as GTalign[3] continue to refine spatial superposition with advanced scoring functions, highlighting an ongoing tension between data-driven flexibility and interpretable, physics-inspired criteria. The interplay between these approaches—whether to encode domain knowledge explicitly or to let neural architectures discover it—remains a central open question, with Optimal Transport Alignment[0] representing a recent effort to blend mathematical rigor with end-to-end learning.

Claimed Contributions

PLASMA framework for residue-level local structural alignment via optimal transport

4 retrieved papers

The authors introduce PLASMA, a framework that reformulates protein substructure alignment as a regularized optimal transport problem solved using differentiable Sinkhorn iterations. This approach produces interpretable alignment matrices and normalized similarity scores for comparing local structural motifs across proteins.

4 retrieved papers

Label Match Loss for training with incomplete annotations

Can Refute

10 retrieved papers

The authors develop a novel loss function called Label Match Loss that enables training on partially annotated data by focusing only on known functional substructures while avoiding penalties on unlabeled but potentially valid matches.

10 retrieved papers

Can Refute

PLASMA-PF training-free variant

Can Refute

10 retrieved papers

The authors present PLASMA-PF, a parameter-free variant of their framework that operates directly on residue embeddings without requiring task-specific training, offering a practical baseline when labeled data is scarce.

10 retrieved papers

Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[8] SAFoldNet: A Novel Tool for Discovering and Aligning Three-Dimensional Protein Structures Based on a Neural Network PDF

D. V. Petrovskiy, K. Nikolsky, V. Rudnev, L. Kulikova, T. Butkova, K. Malsagova, A. Kopylov, Denis V. Petrovskiy, A. Kaysheva, Kirill S. Nikolsky, Vladimir R. Rudnev, Liudmila I. Kulikova, Tatiana V. Butkova, Kristina A. Malsagova, Arthur T. Kopylov, Anna L Kaysheva (2023)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

PLASMA framework for residue-level local structural alignment via optimal transport

[51] Aggregating residue-level protein language model embeddings with optimal transport PDF

Cannot Refute

[52] OT-RMSD: A Generalization of the Root-Mean-Square Deviation for Aligning Unequal-Length Protein Structures PDF

Cannot Refute

[53] Joining softassign and dynamic programming for the contact map overlap problem PDF

Cannot Refute

[54] Proteins comparison through probabilistic optimal structure local alignment. PDF

Cannot Refute

Contribution

Label Match Loss for training with incomplete annotations

[72] Data-efficient active learning for structured prediction with partial annotation and self-training PDF

Can Refute

[65] A robust framework for topology-based anomaly detection in attributed networks using graph attention networks, substructure analysis, and data augmentation PDF

Cannot Refute

[66] 3d medical image segmentation with sparse annotation via cross-teaching between 3d and 2d networks PDF

Cannot Refute

[67] SAM: Self-supervised learning of pixel-wise anatomical embeddings in radiological images PDF

Cannot Refute

[68] Metastable Substructure Embedding and Robust Classification of Multichannel EEG Data Using Spectral Graph Kernels PDF

Cannot Refute

[69] Antiviral discovery using sparse datasets by integrating experiments, molecular simulations, and machine learning PDF

Cannot Refute

[70] Estimating bridge stress histories at remote locations from vibration sparse monitoring PDF

Cannot Refute

[71] Substructural damage identification using autoregressive moving average with exogenous inputs model and sparse regularization PDF

Cannot Refute

[73] Deep neural network automated segmentation of cellular structures in volume electron microscopy PDF

Cannot Refute

[74] Towards robust partially supervised multi-structure medical image segmentation on small-scale data PDF

Cannot Refute

Contribution

PLASMA-PF training-free variant

[56] Embedding-based alignment: combining protein language models with dynamic programming alignment to detect structural similarities in the twilight-zone PDF

Can Refute

[60] Embedding-based alignment: combining protein language models and alignment approaches to detect structural similarities in the twilight-zone PDF

Can Refute

[55] Contrastive learning on protein embeddings enlightens midnight zone PDF

Cannot Refute

[57] Protein structural alignments from sequence PDF

Cannot Refute

[58] SHARKâcapture identifies functional motifs in intrinsically disordered protein regions PDF

Cannot Refute

[59] Fast and adaptive protein structure representations for machine learning PDF

Cannot Refute

[61] Single-Sequence, Structure Free Allosteric Residue Prediction with Protein Language Models PDF

Cannot Refute

[62] Structure-and function-aware substitution matrices via learnable graph matching PDF

Cannot Refute

[63] Protein language model embeddings for fast, accurate, alignment-free protein structure prediction PDF

Cannot Refute

[64] Covary: A translation-aware framework for alignment-free phylogenetics using machine learning PDF

Cannot Refute

Fast and Interpretable Protein Substructure Alignment via Optimal Transport

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[8] SAFoldNet: A Novel Tool for Discovering and Aligning Three-Dimensional Protein Structures Based on a Neural Network PDF

Contribution Analysis

PLASMA framework for residue-level local structural alignment via optimal transport

[51] Aggregating residue-level protein language model embeddings with optimal transport PDF

[52] OT-RMSD: A Generalization of the Root-Mean-Square Deviation for Aligning Unequal-Length Protein Structures PDF

[53] Joining softassign and dynamic programming for the contact map overlap problem PDF

[54] Proteins comparison through probabilistic optimal structure local alignment. PDF

Label Match Loss for training with incomplete annotations

[72] Data-efficient active learning for structured prediction with partial annotation and self-training PDF

[65] A robust framework for topology-based anomaly detection in attributed networks using graph attention networks, substructure analysis, and data augmentation PDF

[66] 3d medical image segmentation with sparse annotation via cross-teaching between 3d and 2d networks PDF

[67] SAM: Self-supervised learning of pixel-wise anatomical embeddings in radiological images PDF

[68] Metastable Substructure Embedding and Robust Classification of Multichannel EEG Data Using Spectral Graph Kernels PDF

[69] Antiviral discovery using sparse datasets by integrating experiments, molecular simulations, and machine learning PDF

[70] Estimating bridge stress histories at remote locations from vibration sparse monitoring PDF

[71] Substructural damage identification using autoregressive moving average with exogenous inputs model and sparse regularization PDF

[73] Deep neural network automated segmentation of cellular structures in volume electron microscopy PDF

[74] Towards robust partially supervised multi-structure medical image segmentation on small-scale data PDF

PLASMA-PF training-free variant

[56] Embedding-based alignment: combining protein language models with dynamic programming alignment to detect structural similarities in the twilight-zone PDF

[60] Embedding-based alignment: combining protein language models and alignment approaches to detect structural similarities in the twilight-zone PDF

[55] Contrastive learning on protein embeddings enlightens midnight zone PDF

[57] Protein structural alignments from sequence PDF

[58] SHARKâcapture identifies functional motifs in intrinsically disordered protein regions PDF

[59] Fast and adaptive protein structure representations for machine learning PDF

[61] Single-Sequence, Structure Free Allosteric Residue Prediction with Protein Language Models PDF

[62] Structure-and function-aware substitution matrices via learnable graph matching PDF

[63] Protein language model embeddings for fast, accurate, alignment-free protein structure prediction PDF

[64] Covary: A translation-aware framework for alignment-free phylogenetics using machine learning PDF

Table of Contents

[58] SHARKâcapture identifies functional motifs in intrinsically disordered protein regions PDF