Preserve and Sculpt: Manifold-Aligned Fine-tuning of Vision-Language Models for Few-Shot Learning

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

Vision-Language ModelFew-shot TransferImage Classification

Pretrained vision-language models (VLMs), such as CLIP, have shown remarkable potential in few-shot image classification and led to numerous effective transfer learning strategies. These methods leverage the pretrained knowledge of VLMs to enable effective domain adaptation while mitigating overfitting through parameter-efficient tuning or instance-based consistency constraints. However, such regularizations often neglect the geometric structure of data distribution, which may lead to distortion of the overall semantic representation. To overcome this limitation, we propose a novel fine-tuning method, Manifold-Preserving and Sculpting Tuning (MPS-Tuning). Regarding the data distribution in feature space as a semantic manifold, MPS-Tuning explicitly constrains the intrinsic geometry of this manifold while further sculpting it to enhance class separability. Specifically, MPS-Tuning preserves both macroscopic and microscopic topological structures of the original manifold by aligning Gram matrices of features before and after fine-tuning. Theoretically, this constraint is shown to approximate an upper bound of the Gromov-Wasserstein distance. Furthermore, features from the image and text modalities are paired, and pairwise similarities are optimized to enhance the manifold’s class discriminability. Extensive experiments demonstrate that MPS-Tuning significantly improves model performance while effectively preserving the structure of the semantic manifold. The code will be released.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes Manifold-Preserving and Sculpting Tuning (MPS-Tuning), a fine-tuning framework that constrains the intrinsic geometry of feature manifolds while enhancing class separability. It resides in the Manifold Topology Alignment leaf, which contains three papers total. This leaf sits within the broader Geometric and Topological Structure Preservation branch, indicating a relatively sparse but focused research direction. The taxonomy shows that explicit geometric constraints during VLM fine-tuning remain less explored compared to adapter-based or prompt-based methods, suggesting the paper addresses a niche but theoretically motivated problem space.

The taxonomy reveals neighboring directions such as 3D Geometric Distillation and Cross-Modal Semantic Hierarchy Modeling within the same parent branch, alongside parallel branches like Adapter-Based Parameter-Efficient Fine-Tuning and Prompt-Based Tuning. The sibling papers in Manifold Topology Alignment—Homology Consistency Tuning and Topology-Aware CLIP—both enforce topological invariants but differ in mechanism. MPS-Tuning's use of Gram matrix alignment and Gromov-Wasserstein distance approximation appears distinct from homological constraints or direct topological priors, positioning it as a complementary approach within the geometric preservation paradigm rather than a direct extension of existing methods.

Among 21 candidates examined, none clearly refute the three core contributions. The MPS-Tuning framework was compared against 10 candidates with zero refutable overlaps; the Manifold Alignment Regularization examined 1 candidate with no refutation; and the Hierarchical Manifold Sculpting strategy reviewed 10 candidates, also with zero refutations. This limited search scope suggests that within the top-K semantic matches and citation expansions, no prior work explicitly combines Gram matrix alignment with hierarchical sculpting for VLM fine-tuning. However, the small candidate pool and sparse taxonomy leaf indicate the analysis covers a narrow slice of the literature rather than an exhaustive survey.

Based on the limited search scope of 21 candidates, the work appears to occupy a relatively unexplored intersection of manifold geometry and VLM adaptation. The absence of refutable prior work within this sample, combined with the sparse taxonomy leaf, suggests potential novelty in the specific technical approach. However, the analysis does not capture broader geometric learning literature or alternative manifold-preserving methods outside the top-K semantic neighborhood, leaving open the possibility of related work in adjacent fields or earlier geometric deep learning research.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: manifold-preserving fine-tuning of vision-language models for few-shot learning. The field addresses how to adapt large-scale vision-language models (VLMs) to downstream tasks with limited labeled data while maintaining the geometric and semantic structure learned during pretraining. The taxonomy reveals several complementary strategies: Geometric and Topological Structure Preservation focuses on explicitly safeguarding manifold properties and topological invariants during adaptation; Adapter-Based Parameter-Efficient Fine-Tuning introduces lightweight modules to minimize disruption to pretrained weights; Prompt-Based Tuning and Meta-Learning optimizes input-level or context-level representations; Graph-Based and Training-Free Adaptation leverages relational structures without gradient updates; Robust Fine-Tuning for Distribution Shift targets generalization under domain changes; and Application-Specific VLM Adaptation tailors methods to particular use cases such as robotics or document understanding. Representative works like Homology Consistency Tuning[2] and Topology-Aware CLIP[3] illustrate how topological constraints can guide fine-tuning, while methods such as Complementary Subspace LoRA[7] and Singular Value Adaptation[6] demonstrate parameter-efficient pathways that preserve pretrained structure. A particularly active line of research centers on geometric and topological guarantees during adaptation. Preserve and Sculpt[0] sits within the Manifold Topology Alignment cluster, emphasizing the preservation of intrinsic manifold structure while sculpting task-specific decision boundaries. This contrasts with nearby approaches: Homology Consistency Tuning[2] enforces homological invariants to maintain topological features across layers, whereas Topology-Aware CLIP[3] integrates topological priors directly into the CLIP framework. Compared to these neighbors, Preserve and Sculpt[0] appears to balance explicit manifold constraints with flexible boundary refinement, addressing the trade-off between retaining pretrained knowledge and achieving task-specific discrimination. Across branches, open questions persist around the interplay between parameter efficiency, topological fidelity, and generalization under distribution shift, with methods like Structure-Induced Gradient[5] and Context-Aware Label Propagation[4] exploring alternative regularization and propagation strategies to navigate these challenges.

Claimed Contributions

Manifold-Preserving and Sculpting Tuning (MPS-Tuning) framework

10 retrieved papers

The authors introduce MPS-Tuning, a framework that treats data distribution in feature space as a semantic manifold and explicitly constrains its intrinsic geometry while sculpting it to enhance class separability. This approach addresses limitations of existing methods that neglect geometric structure preservation during fine-tuning.

10 retrieved papers

Manifold Alignment Regularization with theoretical connection to Gromov-Wasserstein distance

1 retrieved paper

The authors propose Manifold Alignment Regularization (MAR) that preserves both macroscopic and microscopic topological structures by aligning Gram matrices of features before and after fine-tuning. They theoretically prove that this constraint approximates an upper bound of the Gromov-Wasserstein distance, providing the first such connection in VLM fine-tuning.

1 retrieved paper

Hierarchical Manifold Sculpting optimization strategy

10 retrieved papers

The authors develop Hierarchical Manifold Sculpting (HMS), which uses a multimodal query-support matching task to optimize pairwise similarities between image and text modalities. This sculpting mechanism extends from output features to intermediate layer features through pseudo-forward projection, enhancing manifold discriminability.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[2] Homology consistency constrained efficient tuning for vision-language models PDF

Zhendong Mao, Huatian Zhang, Lei Zhang, Yongdong Zhang (2024)

[3] Topology-Aware CLIP Few-Shot Learning PDF

Huang, Dazhi, Dazhi Huang (2025)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Manifold-Preserving and Sculpting Tuning (MPS-Tuning) framework

[26] Few-shot classification based on manifold metric learning PDF

Cannot Refute

[27] Charting the right manifold: Manifold mixup for few-shot learning PDF

Cannot Refute

[28] Surface vision mamba: Leveraging bidirectional state space model for efficient spherical manifold representation PDF

Cannot Refute

[29] â¦ Prediction of Coal Mine Water Inrush Probability: An Integrated Approach Driven by Gaussian Mixture Modeling with Manifold Learning and Metaheuristic â¦ PDF

Cannot Refute

[30] Embedding propagation: Smoother manifold for few-shot classification PDF

Cannot Refute

[31] Real-Time Intelligent Recognition and Precise Drilling in Strongly Heterogeneous Formations Based on Multi-Parameter Logging While Drilling and Drilling â¦ PDF

Cannot Refute

[32] Augmenting small biomedical datasets using generative AI methods based on self-organizing neural networks PDF

Cannot Refute

[33] Few-shot learning with geometric constraints PDF

Cannot Refute

[34] Few-Shot Learning With Manifold-Enhanced LLM for Handling Anomalous Perception Inputs in Autonomous Driving PDF

Cannot Refute

[35] Few-shot Learning over Graphs Using Topological Prompts PDF

Cannot Refute

Contribution

Manifold Alignment Regularization with theoretical connection to Gromov-Wasserstein distance

[36] Information-Geometric Perspectives on Merging Variational Foundation Models PDF

Cannot Refute

Contribution

Hierarchical Manifold Sculpting optimization strategy

[16] Hierarchical Consensus Hashing for Cross-Modal Retrieval PDF

Cannot Refute

[17] Hope: A hierarchical perspective for semi-supervised 2d-3d cross-modal retrieval PDF

Cannot Refute

[18] Supervised Hierarchical Online Hashing for Cross-modal Retrieval PDF

Cannot Refute

[19] Hierarchical set-to-set representation for 3-d cross-modal retrieval PDF

Cannot Refute

[20] Parameter Hierarchical Optimization for Visible-Infrared Person Re-Identification PDF

Cannot Refute

[21] Hi-CMD: Hierarchical cross-modality disentanglement for visible-infrared person re-identification PDF

Cannot Refute

[22] Cross-modality hierarchical clustering and refinement for unsupervised visible-infrared person re-identification PDF

Cannot Refute

[23] Hierarchical discriminative learning for visible thermal person re-identification PDF

Cannot Refute

[24] Hierarchical Cross-Modal Alignment for Open-Vocabulary 3D Object Detection PDF

Cannot Refute

[25] A hierarchical cross-modal spatial fusion network for multimodal emotion recognition PDF

Cannot Refute

Preserve and Sculpt: Manifold-Aligned Fine-tuning of Vision-Language Models for Few-Shot Learning

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[2] Homology consistency constrained efficient tuning for vision-language models PDF

[3] Topology-Aware CLIP Few-Shot Learning PDF

Contribution Analysis

Manifold-Preserving and Sculpting Tuning (MPS-Tuning) framework

[26] Few-shot classification based on manifold metric learning PDF

[27] Charting the right manifold: Manifold mixup for few-shot learning PDF

[28] Surface vision mamba: Leveraging bidirectional state space model for efficient spherical manifold representation PDF

[29] â¦ Prediction of Coal Mine Water Inrush Probability: An Integrated Approach Driven by Gaussian Mixture Modeling with Manifold Learning and Metaheuristic â¦ PDF

[30] Embedding propagation: Smoother manifold for few-shot classification PDF

[31] Real-Time Intelligent Recognition and Precise Drilling in Strongly Heterogeneous Formations Based on Multi-Parameter Logging While Drilling and Drilling â¦ PDF

[32] Augmenting small biomedical datasets using generative AI methods based on self-organizing neural networks PDF

[33] Few-shot learning with geometric constraints PDF

[34] Few-Shot Learning With Manifold-Enhanced LLM for Handling Anomalous Perception Inputs in Autonomous Driving PDF

[35] Few-shot Learning over Graphs Using Topological Prompts PDF

Manifold Alignment Regularization with theoretical connection to Gromov-Wasserstein distance

[36] Information-Geometric Perspectives on Merging Variational Foundation Models PDF

Hierarchical Manifold Sculpting optimization strategy

[16] Hierarchical Consensus Hashing for Cross-Modal Retrieval PDF

[17] Hope: A hierarchical perspective for semi-supervised 2d-3d cross-modal retrieval PDF

[18] Supervised Hierarchical Online Hashing for Cross-modal Retrieval PDF

[19] Hierarchical set-to-set representation for 3-d cross-modal retrieval PDF

[20] Parameter Hierarchical Optimization for Visible-Infrared Person Re-Identification PDF

[21] Hi-CMD: Hierarchical cross-modality disentanglement for visible-infrared person re-identification PDF

[22] Cross-modality hierarchical clustering and refinement for unsupervised visible-infrared person re-identification PDF

[23] Hierarchical discriminative learning for visible thermal person re-identification PDF

[24] Hierarchical Cross-Modal Alignment for Open-Vocabulary 3D Object Detection PDF

[25] A hierarchical cross-modal spatial fusion network for multimodal emotion recognition PDF

Table of Contents

[29] â¦ Prediction of Coal Mine Water Inrush Probability: An Integrated Approach Driven by Gaussian Mixture Modeling with Manifold Learning and Metaheuristic â¦ PDF

[31] Real-Time Intelligent Recognition and Precise Drilling in Strongly Heterogeneous Formations Based on Multi-Parameter Logging While Drilling and Drilling â¦ PDF