Preserve and Sculpt: Manifold-Aligned Fine-tuning of Vision-Language Models for Few-Shot Learning

ICLR 2026 Conference SubmissionAnonymous Authors
Vision-Language ModelFew-shot TransferImage Classification
Abstract:

Pretrained vision-language models (VLMs), such as CLIP, have shown remarkable potential in few-shot image classification and led to numerous effective transfer learning strategies. These methods leverage the pretrained knowledge of VLMs to enable effective domain adaptation while mitigating overfitting through parameter-efficient tuning or instance-based consistency constraints. However, such regularizations often neglect the geometric structure of data distribution, which may lead to distortion of the overall semantic representation. To overcome this limitation, we propose a novel fine-tuning method, Manifold-Preserving and Sculpting Tuning (MPS-Tuning). Regarding the data distribution in feature space as a semantic manifold, MPS-Tuning explicitly constrains the intrinsic geometry of this manifold while further sculpting it to enhance class separability. Specifically, MPS-Tuning preserves both macroscopic and microscopic topological structures of the original manifold by aligning Gram matrices of features before and after fine-tuning. Theoretically, this constraint is shown to approximate an upper bound of the Gromov-Wasserstein distance. Furthermore, features from the image and text modalities are paired, and pairwise similarities are optimized to enhance the manifold’s class discriminability. Extensive experiments demonstrate that MPS-Tuning significantly improves model performance while effectively preserving the structure of the semantic manifold. The code will be released.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes Manifold-Preserving and Sculpting Tuning (MPS-Tuning), a fine-tuning framework that constrains the intrinsic geometry of feature manifolds while enhancing class separability. It resides in the Manifold Topology Alignment leaf, which contains three papers total. This leaf sits within the broader Geometric and Topological Structure Preservation branch, indicating a relatively sparse but focused research direction. The taxonomy shows that explicit geometric constraints during VLM fine-tuning remain less explored compared to adapter-based or prompt-based methods, suggesting the paper addresses a niche but theoretically motivated problem space.

The taxonomy reveals neighboring directions such as 3D Geometric Distillation and Cross-Modal Semantic Hierarchy Modeling within the same parent branch, alongside parallel branches like Adapter-Based Parameter-Efficient Fine-Tuning and Prompt-Based Tuning. The sibling papers in Manifold Topology Alignment—Homology Consistency Tuning and Topology-Aware CLIP—both enforce topological invariants but differ in mechanism. MPS-Tuning's use of Gram matrix alignment and Gromov-Wasserstein distance approximation appears distinct from homological constraints or direct topological priors, positioning it as a complementary approach within the geometric preservation paradigm rather than a direct extension of existing methods.

Among 21 candidates examined, none clearly refute the three core contributions. The MPS-Tuning framework was compared against 10 candidates with zero refutable overlaps; the Manifold Alignment Regularization examined 1 candidate with no refutation; and the Hierarchical Manifold Sculpting strategy reviewed 10 candidates, also with zero refutations. This limited search scope suggests that within the top-K semantic matches and citation expansions, no prior work explicitly combines Gram matrix alignment with hierarchical sculpting for VLM fine-tuning. However, the small candidate pool and sparse taxonomy leaf indicate the analysis covers a narrow slice of the literature rather than an exhaustive survey.

Based on the limited search scope of 21 candidates, the work appears to occupy a relatively unexplored intersection of manifold geometry and VLM adaptation. The absence of refutable prior work within this sample, combined with the sparse taxonomy leaf, suggests potential novelty in the specific technical approach. However, the analysis does not capture broader geometric learning literature or alternative manifold-preserving methods outside the top-K semantic neighborhood, leaving open the possibility of related work in adjacent fields or earlier geometric deep learning research.

Taxonomy

Core-task Taxonomy Papers
15
3
Claimed Contributions
21
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: manifold-preserving fine-tuning of vision-language models for few-shot learning. The field addresses how to adapt large-scale vision-language models (VLMs) to downstream tasks with limited labeled data while maintaining the geometric and semantic structure learned during pretraining. The taxonomy reveals several complementary strategies: Geometric and Topological Structure Preservation focuses on explicitly safeguarding manifold properties and topological invariants during adaptation; Adapter-Based Parameter-Efficient Fine-Tuning introduces lightweight modules to minimize disruption to pretrained weights; Prompt-Based Tuning and Meta-Learning optimizes input-level or context-level representations; Graph-Based and Training-Free Adaptation leverages relational structures without gradient updates; Robust Fine-Tuning for Distribution Shift targets generalization under domain changes; and Application-Specific VLM Adaptation tailors methods to particular use cases such as robotics or document understanding. Representative works like Homology Consistency Tuning[2] and Topology-Aware CLIP[3] illustrate how topological constraints can guide fine-tuning, while methods such as Complementary Subspace LoRA[7] and Singular Value Adaptation[6] demonstrate parameter-efficient pathways that preserve pretrained structure. A particularly active line of research centers on geometric and topological guarantees during adaptation. Preserve and Sculpt[0] sits within the Manifold Topology Alignment cluster, emphasizing the preservation of intrinsic manifold structure while sculpting task-specific decision boundaries. This contrasts with nearby approaches: Homology Consistency Tuning[2] enforces homological invariants to maintain topological features across layers, whereas Topology-Aware CLIP[3] integrates topological priors directly into the CLIP framework. Compared to these neighbors, Preserve and Sculpt[0] appears to balance explicit manifold constraints with flexible boundary refinement, addressing the trade-off between retaining pretrained knowledge and achieving task-specific discrimination. Across branches, open questions persist around the interplay between parameter efficiency, topological fidelity, and generalization under distribution shift, with methods like Structure-Induced Gradient[5] and Context-Aware Label Propagation[4] exploring alternative regularization and propagation strategies to navigate these challenges.

Claimed Contributions

Manifold-Preserving and Sculpting Tuning (MPS-Tuning) framework

The authors introduce MPS-Tuning, a framework that treats data distribution in feature space as a semantic manifold and explicitly constrains its intrinsic geometry while sculpting it to enhance class separability. This approach addresses limitations of existing methods that neglect geometric structure preservation during fine-tuning.

10 retrieved papers
Manifold Alignment Regularization with theoretical connection to Gromov-Wasserstein distance

The authors propose Manifold Alignment Regularization (MAR) that preserves both macroscopic and microscopic topological structures by aligning Gram matrices of features before and after fine-tuning. They theoretically prove that this constraint approximates an upper bound of the Gromov-Wasserstein distance, providing the first such connection in VLM fine-tuning.

1 retrieved paper
Hierarchical Manifold Sculpting optimization strategy

The authors develop Hierarchical Manifold Sculpting (HMS), which uses a multimodal query-support matching task to optimize pairwise similarities between image and text modalities. This sculpting mechanism extends from output features to intermediate layer features through pseudo-forward projection, enhancing manifold discriminability.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Manifold-Preserving and Sculpting Tuning (MPS-Tuning) framework

The authors introduce MPS-Tuning, a framework that treats data distribution in feature space as a semantic manifold and explicitly constrains its intrinsic geometry while sculpting it to enhance class separability. This approach addresses limitations of existing methods that neglect geometric structure preservation during fine-tuning.

Contribution

Manifold Alignment Regularization with theoretical connection to Gromov-Wasserstein distance

The authors propose Manifold Alignment Regularization (MAR) that preserves both macroscopic and microscopic topological structures by aligning Gram matrices of features before and after fine-tuning. They theoretically prove that this constraint approximates an upper bound of the Gromov-Wasserstein distance, providing the first such connection in VLM fine-tuning.

Contribution

Hierarchical Manifold Sculpting optimization strategy

The authors develop Hierarchical Manifold Sculpting (HMS), which uses a multimodal query-support matching task to optimize pairwise similarities between image and text modalities. This sculpting mechanism extends from output features to intermediate layer features through pseudo-forward projection, enhancing manifold discriminability.

Preserve and Sculpt: Manifold-Aligned Fine-tuning of Vision-Language Models for Few-Shot Learning | Novelty Validation