Reverse Distillation: Disentangling and Scaling Protein Language Model Representations

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.5 Download Report PDF

Protein language modelsmodel scalingRepresentation learningSubspace decompositioninterpretabilityModel distillation

Unlike the foundation model scaling laws seen in natural language processing and computer vision, biological foundation models scale relatively poorly. For example, the ESM-2 family of protein language models plateaus at 650M-3B parameters on ProteinGym benchmarks. We address this limitation by introducing Reverse Distillation, a principled framework that decomposes large protein language model representations into orthogonal subspaces guided by smaller models of the same family. We hypothesize that this decomposition matches the natural hierarchy of protein properties, where broad features like secondary structure are robustly captured by compact, smaller models while the residual capacity of larger models specializes in protein-family specific functions. Our method is theoretically grounded and enables monotonic scaling---larger reverse-distilled models consistently outperform their smaller counterparts, overcoming the scaling plateau. Moreover, on ProteinGym benchmarks, reverse-distilled ESM-2 variants broadly outperform their respective baseline models at the same embedding dimensionality. Our approach offers a generalizable framework for disentangling hierarchical feature spaces in foundation model embeddings, with potential applications across biology and other domains where scaling challenges persist.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces Reverse Distillation, a framework for decomposing protein language model representations into orthogonal subspaces guided by smaller models. Within the taxonomy, it occupies the sole position in the 'Hierarchical Feature Disentanglement via Reverse Distillation' leaf under 'Representation Decomposition and Distillation Methods'. This leaf contains only the original paper itself, indicating a sparse research direction with no sibling papers identified in the taxonomy structure. The approach targets the scaling plateau observed in ESM-2 models, aiming to enable monotonic performance improvements as model size increases.

The taxonomy reveals two main branches: representation decomposition methods and retrieval-based augmentation approaches. The original paper sits within the decomposition branch, which focuses on internal restructuring of learned features rather than external data augmentation. The neighboring 'Retrieved Sequence Augmentation' leaf represents an alternative strategy that enhances representations through database retrieval. The taxonomy's scope notes clarify that methods using model-guided decomposition belong in the original paper's branch, while those relying on external sequence databases fall into the retrieval category, suggesting these represent distinct methodological paradigms within protein language model research.

Among 25 candidates examined across three contributions, no clearly refutable prior work was identified. The core Reverse Distillation framework examined 10 candidates with zero refutations, the hierarchical decomposition with theoretical guarantees examined 10 candidates with zero refutations, and the Matryoshka-style embeddings examined 5 candidates with zero refutations. This limited search scope suggests that within the top-25 semantically similar papers, no direct overlaps were detected. However, the analysis explicitly notes this is not an exhaustive literature review, and the sparse taxonomy structure (only one paper in the leaf) may reflect either genuine novelty or limitations in the search methodology.

Based on the limited search of 25 candidates, the work appears to occupy a relatively unexplored niche within protein language model scaling. The absence of sibling papers in the taxonomy and zero refutable candidates across all contributions suggest potential novelty, though this assessment is constrained by the search scope. The taxonomy structure indicates the field has alternative approaches (retrieval-based methods) but limited prior work specifically on reverse distillation for hierarchical feature decomposition in protein models.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Improving scaling behavior of protein language models through representation decomposition. The field addresses how to enhance protein language models by breaking down their learned representations into more interpretable or efficient components. The taxonomy reveals two main branches: one focused on Representation Decomposition and Distillation Methods, which explores techniques for disentangling hierarchical features and compressing knowledge from larger models, and another centered on Retrieval-Based Augmentation for Representation Learning, which leverages external sequence databases to enrich model representations. These branches reflect complementary strategies—internal restructuring of learned features versus external augmentation through retrieved information—both aiming to improve model scalability and performance on protein-related tasks. Within the decomposition branch, works like Reverse Distillation[0] pursue hierarchical feature disentanglement by reversing traditional distillation flows, aiming to isolate different levels of structural or functional information encoded in protein representations. This contrasts with retrieval-based approaches such as Retrieved Sequence Augmentation[1], which augment model inputs or embeddings by incorporating similar sequences from large databases, thereby grounding predictions in evolutionary context. The original paper sits squarely within the decomposition paradigm, emphasizing how reverse distillation can untangle complex representations to achieve better scaling properties. Compared to retrieval methods that rely on external data, Reverse Distillation[0] focuses on intrinsic model architecture and training dynamics, offering a complementary path toward more efficient and interpretable protein language models as datasets and model sizes continue to grow.

Claimed Contributions

Reverse Distillation framework for decomposing protein language model representations

10 retrieved papers

The authors propose a method that decomposes large protein language model representations into orthogonal subspaces guided by smaller models from the same family. This decomposition separates universal features (captured by smaller models) from specialized features (unique to larger models), addressing the scaling plateau observed in biological foundation models.

10 retrieved papers

Hierarchical decomposition with theoretical optimality guarantees

10 retrieved papers

The method provides a theoretically grounded hierarchical decomposition where each model scale contributes orthogonal information. The authors prove this decomposition is MSE-optimal among all representations that preserve the smaller model's embeddings, ensuring quality approximation of the original space.

10 retrieved papers

Matryoshka-style embeddings enabling monotonic scaling

5 retrieved papers

The framework produces embeddings with a nested prefix structure where smaller-dimensional prefixes correspond to valid reverse-distilled representations at that scale. This enables controlled performance degradation as embedding size decreases and restores monotonic scaling behavior where larger models consistently outperform smaller ones.

5 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Reverse Distillation framework for decomposing protein language model representations

[7] Interpretable feature extraction and dimensionality reduction in ESM2 for protein localization prediction PDF

Cannot Refute

[8] ESM-MHC: An Improved Predictor of MHC Using ESM Protein Language Model PDF

Cannot Refute

[9] ProtSAE: Disentangling and Interpreting Protein Language Models via Semantically-Guided Sparse Autoencoders PDF

Cannot Refute

[10] Sparse autoencoders uncover biologically interpretable features in protein language model representations PDF

Cannot Refute

[11] Mechanistic Interpretability of Fine-Tuned Protein Language Models for Nanobody Thermostability Prediction PDF

Cannot Refute

[12] InterPLM: discovering interpretable features in protein language models via sparse autoencoders. PDF

Cannot Refute

[13] PREreview of "InterPLM: Discovering Interpretable Features in Protein Language Models via Sparse Autoencoders" PDF

Cannot Refute

[14] PLM-eXplain: Divide and Conquer the Protein Embedding Space PDF

Cannot Refute

[15] Sparse Autoencoders for Low- $N$ Protein Function Prediction and Design PDF

Cannot Refute

[16] Dynamic insights into the structural evolution of ACE2âRBD interactions through molecular dynamics simulation, Markov state modeling, and large language model â¦ PDF

Cannot Refute

Contribution

Hierarchical decomposition with theoretical optimality guarantees

[17] Two Heads are Better than One: Distilling Large Language Model Features Into Small Models with Feature Decomposition and Mixture PDF

Cannot Refute

[18] Ethos: Rectifying Language Models in Orthogonal Parameter Space PDF

Cannot Refute

[19] Latent symbol lattices in probabilistic semiosis: An unconventional architectural mechanism for contextual modulation in large language models PDF

Cannot Refute

[20] Multi-level attention-based domain disentanglement for BCDR PDF

Cannot Refute

[21] Computational modeling of hierarchically polarized groups by structured matrix factorization PDF

Cannot Refute

[22] A Bayesian Hierarchical Model for Orthogonal Tucker Decomposition with Oblivious Tensor Compression PDF

Cannot Refute

[23] OrtSAE: Orthogonal Sparse Autoencoders Uncover Atomic Features PDF

Cannot Refute

[24] TripleFDS: Triple Feature Disentanglement and Synthesis for Scene Text Editing PDF

Cannot Refute

[25] Bi-Level Orthogonal Multi-Teacher Distillation PDF

Cannot Refute

[26] Hierarchical Approximate Proper Orthogonal Decomposition PDF

Cannot Refute

Contribution

Matryoshka-style embeddings enabling monotonic scaling

[2] Toward a Unified Theory of Time: Time as Monotonic Information Flow Across Causal Interfaces PDF

Cannot Refute

[3] Latent surface alignment through oscillatory token drift in instruction-following large language models PDF

Cannot Refute

[4] Multi-layer logicâA predicate logic including data structure as knowledge representation language PDF

Cannot Refute

[5] Visual data mining using monotone Boolean functions PDF

Cannot Refute

[6] Monotone Deep Spectrum Kernels PDF

Cannot Refute

Reverse Distillation: Disentangling and Scaling Protein Language Model Representations

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

Contribution Analysis

Reverse Distillation framework for decomposing protein language model representations

[7] Interpretable feature extraction and dimensionality reduction in ESM2 for protein localization prediction PDF

[8] ESM-MHC: An Improved Predictor of MHC Using ESM Protein Language Model PDF

[9] ProtSAE: Disentangling and Interpreting Protein Language Models via Semantically-Guided Sparse Autoencoders PDF

[10] Sparse autoencoders uncover biologically interpretable features in protein language model representations PDF

[11] Mechanistic Interpretability of Fine-Tuned Protein Language Models for Nanobody Thermostability Prediction PDF

[12] InterPLM: discovering interpretable features in protein language models via sparse autoencoders. PDF

[13] PREreview of "InterPLM: Discovering Interpretable Features in Protein Language Models via Sparse Autoencoders" PDF

[14] PLM-eXplain: Divide and Conquer the Protein Embedding Space PDF

[15] Sparse Autoencoders for Low-NNN Protein Function Prediction and Design PDF

[16] Dynamic insights into the structural evolution of ACE2âRBD interactions through molecular dynamics simulation, Markov state modeling, and large language model â¦ PDF

Hierarchical decomposition with theoretical optimality guarantees

[17] Two Heads are Better than One: Distilling Large Language Model Features Into Small Models with Feature Decomposition and Mixture PDF

[18] Ethos: Rectifying Language Models in Orthogonal Parameter Space PDF

[19] Latent symbol lattices in probabilistic semiosis: An unconventional architectural mechanism for contextual modulation in large language models PDF

[20] Multi-level attention-based domain disentanglement for BCDR PDF

[21] Computational modeling of hierarchically polarized groups by structured matrix factorization PDF

[22] A Bayesian Hierarchical Model for Orthogonal Tucker Decomposition with Oblivious Tensor Compression PDF

[23] OrtSAE: Orthogonal Sparse Autoencoders Uncover Atomic Features PDF

[24] TripleFDS: Triple Feature Disentanglement and Synthesis for Scene Text Editing PDF

[25] Bi-Level Orthogonal Multi-Teacher Distillation PDF

[26] Hierarchical Approximate Proper Orthogonal Decomposition PDF

Matryoshka-style embeddings enabling monotonic scaling

[2] Toward a Unified Theory of Time: Time as Monotonic Information Flow Across Causal Interfaces PDF

[3] Latent surface alignment through oscillatory token drift in instruction-following large language models PDF

[4] Multi-layer logicâA predicate logic including data structure as knowledge representation language PDF

[5] Visual data mining using monotone Boolean functions PDF

[6] Monotone Deep Spectrum Kernels PDF

Table of Contents

[15] Sparse Autoencoders for Low- $N$ Protein Function Prediction and Design PDF

[16] Dynamic insights into the structural evolution of ACE2âRBD interactions through molecular dynamics simulation, Markov state modeling, and large language model â¦ PDF

[4] Multi-layer logicâA predicate logic including data structure as knowledge representation language PDF