LS-Merge: Merging Language Models in Latent Space

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.7 Download Report PDF

LS-MergeLLM merginglatent spaceweight space learning

Model merging in weight space is an efficient way to reuse pretrained models, but existing methods typically assume matching architectures or sizes, making heterogeneous merges brittle or infeasible. We address this limitation by encoding model weights into a smooth latent space, enabling cross-architecture operations, and performing the merge in the latent space before decoding back to weights. This approach faces two major challenges. First, LLMs contain billions of parameters, which makes latent encoding computationally demanding. Second, using high compression ratios often hinders the encoder’s ability to generalize to unseen weights. We tackle these issues with a transformer-based variational autoencoder (VAE) trained in a two-stage compression curriculum with structured layer-aware chunking: the model first learns a high-capacity latent representation and then distills to a compact code, improving both stability and out-of-distribution generalization. To align heterogeneous models, we introduce a dimensionality-matching projection that allows interpolation between models of different sizes. Empirically, latent-space interpolation is consistently more robust than direct weight-space averaging and yields stronger downstream performance when merging models of different sizes. Together, these components provide a scalable, architecture-agnostic recipe for model merging.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes LS-Merge, a framework that encodes model weights into a latent space using a transformer-based VAE, enabling cross-architecture merging operations before decoding back to weights. This work resides in the 'Latent-Space and Embedding-Based Merging' leaf, which contains only three papers including the original. This is a relatively sparse research direction within the broader taxonomy of 50 papers across 22 leaf nodes, suggesting that latent-space encoding approaches for heterogeneous model merging remain an emerging area compared to more established techniques like direct weight interpolation or ensemble methods.

The taxonomy reveals that this work sits within 'Parameter-Space Merging and Alignment Techniques', adjacent to 'Weight-Space Interpolation and Coefficient Optimization' (3 papers) and 'Layer-Level Integration and Permutation' (2 papers). These neighboring leaves focus on direct parameter manipulation without latent encoding, highlighting a methodological divergence. The broader taxonomy also includes 'Knowledge Transfer and Ensemble Collaboration' (5 papers) and 'Mixture-of-Experts Architectures' (3 papers), which preserve model independence rather than merging parameters. The scope notes clarify that latent-space methods explicitly exclude direct weight averaging and ensemble approaches, positioning this work as a distinct strategy for achieving heterogeneous integration through learned representations.

Among 29 candidates examined, the contribution-level analysis reveals mixed novelty signals. The core LS-Merge framework (Contribution 1) examined 10 candidates with 1 appearing to provide overlapping prior work. The dimensionality-matching projection and optimal transport alignment (Contribution 2) examined 9 candidates with 2 potentially refuting papers. The two-stage compression curriculum with layer-aware chunking (Contribution 3) examined 10 candidates with none clearly refuting it, suggesting this training strategy may be more novel within the limited search scope. These statistics indicate that while the overall latent-space merging concept has some precedent, specific technical components—particularly the compression curriculum—appear less explored in the examined literature.

Based on the limited search of 29 semantically similar papers, the work appears to occupy a relatively sparse research direction with modest prior overlap. The taxonomy structure confirms that latent-space encoding for heterogeneous merging is less crowded than direct weight-space methods or ensemble approaches. However, the analysis does not cover exhaustive literature search or systematic review of all related compression and alignment techniques, leaving open the possibility of additional relevant work outside the top-K semantic matches examined.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: merging language models with heterogeneous architectures. The field has evolved to address the challenge of combining models that differ in structure, training objectives, or domain specialization. The taxonomy reveals several complementary strategies: Parameter-Space Merging and Alignment Techniques focus on direct weight manipulation and embedding-based fusion methods such as those explored in LS-Merge[0] and Mergenet[1]; Knowledge Transfer and Ensemble Collaboration emphasize collaborative inference and distillation approaches like Ensemble Heterogeneous LLMs[5] and Mixture-of-Agents[9]; Mixture-of-Experts Architectures provide modular frameworks for routing across heterogeneous components; while Multimodal and Cross-Domain Model Integration tackles the broader challenge of fusing models trained on different modalities or tasks. Domain-Specific Merging Applications and Optimization and Evaluation Frameworks address practical deployment and benchmarking concerns, with supporting methods in auxiliary techniques rounding out the landscape. A particularly active line of work centers on latent-space and embedding-based merging, where models are aligned through learned transformations rather than naive parameter averaging. LS-Merge[0] exemplifies this direction by operating in a shared latent space to bridge architectural differences, positioning itself alongside Mergenet[1] which also emphasizes learned alignment mechanisms, and Knowledge Fusion LLMs[2] which explores fusion at the representation level. These approaches contrast with ensemble methods like Mixture-of-Agents[9] that preserve model independence during inference, and with mixture-of-experts frameworks such as CoMoE[42] that introduce explicit gating. A recurring theme across branches is the trade-off between integration depth—whether to merge parameters directly, align intermediate representations, or coordinate outputs—and the preservation of specialized capabilities. Open questions include how to efficiently search the space of possible merges, as addressed by Evolutionary Model Merging[16], and how to evaluate merged models across diverse benchmarks without retraining from scratch.

Claimed Contributions

LS-Merge framework for merging LLMs in latent space

Can Refute

10 retrieved papers

The authors introduce LS-Merge, a framework that encodes model weights into a smooth latent space using a transformer-based variational autoencoder, performs merging operations in this latent space, and decodes back to weights. This approach enables both homogeneous and heterogeneous model merging without requiring architectural alignment.

10 retrieved papers

Can Refute

Dimensionality-matching projection and OT-based alignment for heterogeneous merging

Can Refute

9 retrieved papers

The authors develop a method combining proportional dimensionality mapping with Optimal Transport alignment to enable merging of models with mismatched architectures (different depths or widths). This addresses the geometric incompatibility of latent distributions from heterogeneous models by registering their manifolds before interpolation.

9 retrieved papers

Can Refute

Two-stage compression curriculum with layer-aware chunking

10 retrieved papers

The authors propose a training strategy that first learns a high-capacity latent representation using a deterministic autoencoder, then enables the KL term to structure the latent space. This curriculum, combined with layer-aware chunking of weight tensors, improves stability and out-of-distribution generalization when encoding LLM weights with heavy-tailed distributions.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[1] Mergenet: Knowledge migration across heterogeneous models, tasks, and modalities PDF

Zhan Tianyu, Kunxi Li, Tianyu Zhan, Zhang Shengyu, Shengyu Zhang, Kuang, Kun, Kun Kuang, Li Jiwei, Jiwei Li, Zhao Zhou, Zhou Zhao, Wu Fan, Fei Wu, Wu, Fei (2025)

[2] Knowledge fusion of large language models PDF

Wan, Fanqi, Fanqi Wan, Huang Xinting, Xinting Huang, Cai, Deng, Deng Cai, Quan, Xiaojun, Xiaojun Quan, Bi, Wei, Wei Bi, Shi, Shuming, Shuming Shi (2024)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

LS-Merge framework for merging LLMs in latent space

[62] SeMe: Training-Free Language Model Merging via Semantic Alignment PDF

Can Refute

[61] Emergent semantic entanglement in large language models: Non-sequential contextual weaving through stochastic syntagmatic bridges PDF

Cannot Refute

[63] Latent syntax weaving in large language model representations: A novel mechanism for self-referential consistency in neural architectures PDF

Cannot Refute

[64] Flows and Diffusions on the Neural Manifold PDF

Cannot Refute

[65] EnergyMogen: Compositional Human Motion Generation with Energy-Based Diffusion Model in Latent Space PDF

Cannot Refute

[66] Latent feature transformation for emergent task performance in large language models PDF

Cannot Refute

[67] Unsupervised Neural Machine Translation with Weight Sharing PDF

Cannot Refute

[68] Hierarchical Contextual Manifold Alignment for Structuring Latent Representations in Large Language Models PDF

Cannot Refute

[69] EvoEdit: Lifelong Free-Text Knowledge Editing through Latent Perturbation Augmentation and Knowledge-driven Parameter Fusion PDF

Cannot Refute

[70] AlignMerge - Alignment-Preserving Large Language Model Merging via Fisher-Guided Geometric Constraints PDF

Cannot Refute

Contribution

Dimensionality-matching projection and OT-based alignment for heterogeneous merging

[71] Transformer fusion with optimal transport PDF

Can Refute

[72] Model fusion via optimal transport PDF

Can Refute

[74] Towards meta-pruning via optimal transport PDF

Cannot Refute

[75] Merging embedded topics with optimal transport for online topic modeling on data streams PDF

Cannot Refute

[76] Graph optimal transport for cross-domain alignment PDF

Cannot Refute

[77] SuperGlue: Learning Feature Matching With Graph Neural Networks PDF

Cannot Refute

[78] Unsupervised Learning for Optimal Transport plan prediction between unbalanced graphs PDF

Cannot Refute

[79] A Survey on Optimal Transport for Machine Learning: Theory and Applications PDF

Cannot Refute

[80] Fusion of Graph Neural Networks via Optimal Transport PDF

Cannot Refute

Contribution

Two-stage compression curriculum with layer-aware chunking

[51] Diffusion models for 3D generation: A survey PDF

Cannot Refute

[52] Machine Learning and Finite Element Simulation for Performance-Driven Generative Design in Aerodynamic Applications PDF

Cannot Refute

[53] Rynnvla-001: Using human demonstrations to improve robot manipulation PDF

Cannot Refute

[54] Reducio! generating 1k video within 16 seconds using extremely compressed motion latents PDF

Cannot Refute

[55] RAVE: A variational autoencoder for fast and high-quality neural audio synthesis PDF

Cannot Refute

[56] 3D Skull Completion via Two-stage Conditional Diffusion-Based Signed Distance Fields PDF

Cannot Refute

[57] Map-Assisted Remote-Sensing Image Compression at Extremely Low Bitrates PDF

Cannot Refute

[58] Edge-Aware Reparameterizable Network With Hybrid Bottlenecks and Spatial Attention for Efficient Compressed Image Super-Resolution PDF

Cannot Refute

[59] Learning Unified User Quantized Tokenizers for User Representation PDF

Cannot Refute

[60] Surgical Robot Learning: From Demonstration and Simulation to World Models-A Review PDF

Cannot Refute

LS-Merge: Merging Language Models in Latent Space

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[1] Mergenet: Knowledge migration across heterogeneous models, tasks, and modalities PDF

[2] Knowledge fusion of large language models PDF

Contribution Analysis

LS-Merge framework for merging LLMs in latent space

[62] SeMe: Training-Free Language Model Merging via Semantic Alignment PDF

[61] Emergent semantic entanglement in large language models: Non-sequential contextual weaving through stochastic syntagmatic bridges PDF

[63] Latent syntax weaving in large language model representations: A novel mechanism for self-referential consistency in neural architectures PDF

[64] Flows and Diffusions on the Neural Manifold PDF

[65] EnergyMogen: Compositional Human Motion Generation with Energy-Based Diffusion Model in Latent Space PDF

[66] Latent feature transformation for emergent task performance in large language models PDF

[67] Unsupervised Neural Machine Translation with Weight Sharing PDF

[68] Hierarchical Contextual Manifold Alignment for Structuring Latent Representations in Large Language Models PDF

[69] EvoEdit: Lifelong Free-Text Knowledge Editing through Latent Perturbation Augmentation and Knowledge-driven Parameter Fusion PDF

[70] AlignMerge - Alignment-Preserving Large Language Model Merging via Fisher-Guided Geometric Constraints PDF

Dimensionality-matching projection and OT-based alignment for heterogeneous merging

[71] Transformer fusion with optimal transport PDF

[72] Model fusion via optimal transport PDF

[74] Towards meta-pruning via optimal transport PDF

[75] Merging embedded topics with optimal transport for online topic modeling on data streams PDF

[76] Graph optimal transport for cross-domain alignment PDF

[77] SuperGlue: Learning Feature Matching With Graph Neural Networks PDF

[78] Unsupervised Learning for Optimal Transport plan prediction between unbalanced graphs PDF

[79] A Survey on Optimal Transport for Machine Learning: Theory and Applications PDF

[80] Fusion of Graph Neural Networks via Optimal Transport PDF

Two-stage compression curriculum with layer-aware chunking

[51] Diffusion models for 3D generation: A survey PDF

[52] Machine Learning and Finite Element Simulation for Performance-Driven Generative Design in Aerodynamic Applications PDF

[53] Rynnvla-001: Using human demonstrations to improve robot manipulation PDF

[54] Reducio! generating 1k video within 16 seconds using extremely compressed motion latents PDF

[55] RAVE: A variational autoencoder for fast and high-quality neural audio synthesis PDF

[56] 3D Skull Completion via Two-stage Conditional Diffusion-Based Signed Distance Fields PDF

[57] Map-Assisted Remote-Sensing Image Compression at Extremely Low Bitrates PDF

[58] Edge-Aware Reparameterizable Network With Hybrid Bottlenecks and Spatial Attention for Efficient Compressed Image Super-Resolution PDF

[59] Learning Unified User Quantized Tokenizers for User Representation PDF

[60] Surgical Robot Learning: From Demonstration and Simulation to World Models-A Review PDF

Table of Contents