Fast and Interpretable Protein Substructure Alignment via Optimal Transport

ICLR 2026 Conference SubmissionAnonymous Authors
Protein substructure alignmentResidue-level representationOptimal transportDeep learningStructural bioinformatics
Abstract:

Proteins are essential biological macromolecules that execute life functions. Local motifs within protein structures, such as active sites, are the most critical components for linking structure to function and are key to understanding protein evolution and enabling protein engineering. Existing computational methods struggle to identify and compare these local structures, which leaves a significant gap in understanding protein structures and harnessing their functions. This study presents PLASMA, the first deep learning framework for efficient and interpretable residue-level protein substructure alignment. We reformulate the problem as a regularized optimal transport task and leverage differentiable Sinkhorn iterations. For a pair of input protein structures, PLASMA outputs a clear alignment matrix with an interpretable overall similarity score. Through extensive quantitative evaluations and three biological case studies, we demonstrate that PLASMA achieves accurate, lightweight, and interpretable residue-level alignment. Additionally, we introduce PLASMA-PF, a training-free variant that provides a practical alternative when training data are unavailable. Our method addresses a critical gap in protein structure analysis tools and offers new opportunities for functional annotation, evolutionary studies, and structure-based drug design. Reproducibility is ensured via our official implementation at https://anonymous.4open.science/r/plasma-5A5B/.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

PLASMA introduces a deep learning framework for residue-level protein substructure alignment by reformulating the task as a regularized optimal transport problem with differentiable Sinkhorn iterations. It resides in the 'Deep Learning Alignment Models' leaf under 'Machine Learning and Neural Network Methods', which contains only two papers in the entire taxonomy of fifty. This sparse population suggests the application of deep learning to residue-level local alignment is relatively underexplored compared to geometric or sequence-structure integration methods, which collectively account for over twenty papers across multiple leaves.

The taxonomy reveals that most alignment research concentrates on geometric transformations (Direct Coordinate Superposition, Spatial Indexing) and sequence-structure integration (Profile Threading, Sequence-Guided Alignment), with local substructure detection forming a smaller but distinct cluster. PLASMA's optimal transport formulation diverges from both classical geometric superposition methods and graph-based approaches, instead treating alignment as a distribution-matching problem. Its sibling paper in the same leaf employs graph convolutions and attention mechanisms, indicating that even within the nascent deep learning alignment subfield, methodological diversity is emerging.

Among twenty-four candidates examined, the core PLASMA framework (Contribution A) shows no clear refutation across four candidates, suggesting limited prior work directly addresses optimal transport for local structural alignment. However, the Label Match Loss (Contribution B, ten candidates examined, one refutable) and the training-free variant PLASMA-PF (Contribution C, ten candidates examined, two refutable) encounter more substantial overlap. These statistics indicate that while the overall framework may be novel, specific technical components—handling incomplete annotations and parameter-free alignment—have precedents in the limited literature surveyed.

Based on top-twenty-four semantic matches, PLASMA appears to occupy a relatively unexplored niche at the intersection of optimal transport theory and residue-level alignment. The analysis does not cover exhaustive domain-specific databases or recent preprints, so additional related work may exist beyond this scope. The framework's novelty seems strongest in its core formulation, with incremental aspects in auxiliary training strategies and parameter-free variants.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
24
Contribution Candidate Papers Compared
3
Refutable Paper

Research Landscape Overview

Core task: residue-level protein substructure alignment. The field encompasses a diverse set of strategies for matching protein fragments at fine granularity, ranging from purely geometric methods that superimpose spatial coordinates to sequence-structure integration approaches that blend evolutionary signals with three-dimensional information. Geometric and spatial alignment methods (e.g., Protein Structure Alignment[1], FAST Algorithm[2]) emphasize rigid-body transformations and distance-based scoring, while local substructure and motif detection techniques (e.g., ProBiS[5]) focus on identifying conserved binding sites or functional patches. Hierarchical and multi-scale alignment strategies (e.g., Hierarchical Superposition[12]) decompose structures into nested levels, and graph and network-based approaches represent residues as nodes to exploit topological relationships. Machine learning and neural network methods have recently emerged as a distinct branch, leveraging deep architectures to learn alignment features directly from data (e.g., SAFoldNet[8], Pocket Pretraining[7]), while optimization and search algorithms (e.g., Genetic Algorithm[42]) tackle the combinatorial challenge of finding optimal correspondences. Multiple structure alignment, specialized scoring measures, and high-throughput search tools address scalability and comparative analysis across large datasets. Within this landscape, a particularly active line of work explores how neural networks can capture complex structural patterns that classical heuristics may miss. Optimal Transport Alignment[0] sits squarely in the machine learning and neural network methods branch, employing optimal transport theory to learn residue-level correspondences in a differentiable framework. This contrasts with earlier deep learning models like SAFoldNet[8], which relies on graph convolutions and attention mechanisms, by framing alignment as a distribution-matching problem rather than a direct feature-embedding task. Meanwhile, geometric methods such as GTalign[3] continue to refine spatial superposition with advanced scoring functions, highlighting an ongoing tension between data-driven flexibility and interpretable, physics-inspired criteria. The interplay between these approaches—whether to encode domain knowledge explicitly or to let neural architectures discover it—remains a central open question, with Optimal Transport Alignment[0] representing a recent effort to blend mathematical rigor with end-to-end learning.

Claimed Contributions

PLASMA framework for residue-level local structural alignment via optimal transport

The authors introduce PLASMA, a framework that reformulates protein substructure alignment as a regularized optimal transport problem solved using differentiable Sinkhorn iterations. This approach produces interpretable alignment matrices and normalized similarity scores for comparing local structural motifs across proteins.

4 retrieved papers
Label Match Loss for training with incomplete annotations

The authors develop a novel loss function called Label Match Loss that enables training on partially annotated data by focusing only on known functional substructures while avoiding penalties on unlabeled but potentially valid matches.

10 retrieved papers
Can Refute
PLASMA-PF training-free variant

The authors present PLASMA-PF, a parameter-free variant of their framework that operates directly on residue embeddings without requiring task-specific training, offering a practical baseline when labeled data is scarce.

10 retrieved papers
Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

PLASMA framework for residue-level local structural alignment via optimal transport

The authors introduce PLASMA, a framework that reformulates protein substructure alignment as a regularized optimal transport problem solved using differentiable Sinkhorn iterations. This approach produces interpretable alignment matrices and normalized similarity scores for comparing local structural motifs across proteins.

Contribution

Label Match Loss for training with incomplete annotations

The authors develop a novel loss function called Label Match Loss that enables training on partially annotated data by focusing only on known functional substructures while avoiding penalties on unlabeled but potentially valid matches.

Contribution

PLASMA-PF training-free variant

The authors present PLASMA-PF, a parameter-free variant of their framework that operates directly on residue embeddings without requiring task-specific training, offering a practical baseline when labeled data is scarce.