PSP: Prompt-Guided Self-Training Sampling Policy for Active Prompt Learning

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

Active Prompt LearningReinforcement LearningCLIP

Active Prompt Learning (APL) using vision-language models (\textit{e.g.}, CLIP) has attracted considerable attention for mitigating the dependence on fully labeled dataset in downstream task adaptation. However, existing methods fail to explicitly leverage prompt to guide sample selection, resulting in the selected samples being ineffective in facilitating the prompt template's downstream task adaptation, while also overlooking valuable complementary information in the unselected samples. To fill this gap, we propose a novel Prompt-Guided Self-Training Sampling Policy (PSP) for APL, which integrates Soft Actor-Critic with a customized real-pseudo hybrid reward and vectorized critics to incorporate prompts in guiding sample selection toward those that facilitate the optimization of prompt template, by jointly considering both selected and unselected samples. Specifically, PSP comprises two prominent components: Vectorized Soft Actor-Critic Sampling Policy (VSSP) and Uncertainty Augmented Self-Training (UST) mechanism. VSSP customizes a real-pseudo hybrid reward based on learned prompts and image features, which is fed into vectorized critics to estimate Q-value for each sample and compute gradients that optimize the actor, allowing it to refine its sampling policy in an End-to-End manner to identify the most informative samples for prompt learning. Moreover, UST leverages the CLIP from the previous round to generate reliable pseudo-labeled data based on uncertainty and confidence of average predictions, thereby deepening the understanding of the overall data. Extensive experiments conducted on diverse real-world datasets validate the effectiveness of our PSP.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes a Prompt-Guided Self-Training Sampling Policy (PSP) that combines reinforcement learning-based sample selection with self-training for active prompt learning. It resides in the 'Policy-Based Active Selection' leaf, which contains only two papers including this one. This is a notably sparse research direction within the broader taxonomy of 50 papers across 36 topics, suggesting that policy-based approaches to active sample selection in prompt learning remain relatively underexplored compared to other branches like prompt optimization strategies or domain-specific adaptations.

The taxonomy reveals that most active learning work in this field concentrates on 'Uncertainty and Confidence-Based Selection' (three papers) and 'Open-Set and Open-Vocabulary Active Learning' (one paper), while the broader 'Prompt Optimization and Tuning Strategies' branch is substantially more populated with methods focused on context learning, visual prompts, and regularization. The paper's integration of policy networks with prompt-guided rewards positions it at the intersection of active selection and prompt optimization, diverging from purely uncertainty-driven or heuristic selection methods that dominate neighboring leaves. This cross-cutting approach appears less common in the current landscape.

Among 21 candidates examined, the contribution-level analysis shows mixed novelty signals. The core PSP framework examined 10 candidates with no clear refutations, suggesting reasonable distinctiveness in its overall approach. The Vectorized Soft Actor-Critic component examined only 1 candidate with no refutation, though the limited search scope makes this less conclusive. The Uncertainty Augmented Self-Training mechanism examined 10 candidates and found 1 refutable match, indicating some overlap with existing self-training or uncertainty-based methods. The limited search scale means these findings reflect top-K semantic matches rather than exhaustive coverage.

Based on the available signals from 21 examined candidates, the work appears to occupy a relatively novel position by combining policy-based selection with prompt-guided rewards, though the self-training component shows some prior overlap. The sparse population of its taxonomy leaf and the cross-cutting nature of its approach suggest potential novelty, but the limited search scope prevents definitive conclusions about how thoroughly the space of policy-based active prompt learning has been explored.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: active prompt learning with vision-language models. The field has evolved into a rich ecosystem organized around several complementary themes. Active Sample Selection and Data Efficiency explores how to choose the most informative examples for prompt tuning, often employing policy-based or uncertainty-driven strategies to minimize annotation costs. Prompt Optimization and Tuning Strategies focuses on the mechanics of learning effective prompts, ranging from gradient-based methods like Conditional Prompt Learning[3] to ensemble and meta-learning approaches. Domain-Specific and Task-Specific Adaptations address tailoring prompts to specialized contexts such as medical imaging or open-vocabulary detection, while Robustness and Distribution Challenges tackle generalization under domain shift and out-of-distribution scenarios. Meanwhile, Federated and Privacy-Preserving Learning, Unsupervised and Low-Resource Learning, and Continual and Lifelong Learning branches investigate settings where data is decentralized, scarce, or arrives sequentially. Finally, Theoretical Foundations and Model Analysis provides deeper understanding of why and how prompts work. Within the Active Sample Selection branch, a handful of works have emerged that treat sample selection as a learnable policy problem, aiming to identify which examples yield the greatest improvement in prompt quality. PSP Prompt Sampling[0] sits squarely in this policy-based active selection cluster, emphasizing strategic sampling to boost data efficiency. It shares common ground with Active Prompt Learning[1] and Active Prompt Priors[2], both of which also prioritize intelligent example selection but may differ in their underlying selection criteria or integration with prompt optimization loops. Compared to broader prompt tuning methods like Conditional Prompt Learning[3] or Unsupervised Prompt Learning[4], PSP Prompt Sampling[0] places greater emphasis on the active curation of training samples rather than solely on the prompt representation itself. This positioning highlights an ongoing tension in the field: whether gains come primarily from better prompts or from better data, and how these two dimensions can be jointly optimized.

Claimed Contributions

Prompt-Guided Self-Training Sampling Policy (PSP) for Active Prompt Learning

10 retrieved papers

The authors introduce PSP, a framework that combines Soft Actor-Critic with a tailored real-pseudo hybrid reward and vectorized critics to explicitly leverage prompts for guiding sample selection in active prompt learning. This approach bridges sample selection and prompt learning by jointly considering both selected and unselected samples.

10 retrieved papers

Vectorized Soft Actor-Critic Sampling Policy (VSSP)

1 retrieved paper

VSSP is a component that customizes a real-pseudo hybrid reward using learned prompts and image features, which is then fed into vectorized critics to estimate Q-values for each sample and compute actor gradients. This enables the actor to refine its sampling policy in an end-to-end manner to identify the most informative samples for prompt learning.

1 retrieved paper

Uncertainty Augmented Self-Training (UST) mechanism

Can Refute

10 retrieved papers

UST is a mechanism that leverages the teacher CLIP model to generate reliable pseudo-labeled data by evaluating uncertainty and confidence of average predictions across multiple augmentations. This mechanism extracts complementary information from unselected samples to deepen understanding of the overall data distribution.

10 retrieved papers

Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[1] Active Prompt Learning in Vision Language Models PDF

Jihwan Bang, Sumyeong Ahn, Jae-gil Lee (2024)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Prompt-Guided Self-Training Sampling Policy (PSP) for Active Prompt Learning

[61] Rl-vlm-f: Reinforcement learning from vision language foundation model feedback PDF

Cannot Refute

[62] Vl-rethinker: Incentivizing self-reflection of vision-language models with reinforcement learning PDF

Cannot Refute

[63] Fine-tuning large vision-language models as decision-making agents via reinforcement learning PDF

Cannot Refute

[64] Reason-rft: Reinforcement fine-tuning for visual reasoning of vision language models PDF

Cannot Refute

[65] Improving vision-language-action model with online reinforcement learning PDF

Cannot Refute

[66] Few-Shot Vision-Language Reasoning for Satellite Imagery via Verifiable Rewards PDF

Cannot Refute

[67] RLS3: RL-Based Synthetic Sample Selection to Enhance Spatial Reasoning in Vision-Language Models for Indoor Autonomous Perception PDF

Cannot Refute

[68] VRAG-RL: Empower Vision-Perception-Based RAG for Visually Rich Information Understanding via Iterative Reasoning with Reinforcement Learning PDF

Cannot Refute

[69] WeThink: Toward General-purpose Vision-Language Reasoning via Reinforcement Learning PDF

Cannot Refute

[70] Toward Ambulatory Vision: Learning Visually-Grounded Active View Selection PDF

Cannot Refute

Contribution

Vectorized Soft Actor-Critic Sampling Policy (VSSP)

[71] Decentralized policy gradient method for mean-field linear quadratic regulator with global convergence PDF

Cannot Refute

Contribution

Uncertainty Augmented Self-Training (UST) mechanism

[52] Uncertainty-guided never-ending learning to drive PDF

Can Refute

[51] Exploiting completeness and uncertainty of pseudo labels for weakly supervised video anomaly detection PDF

Cannot Refute

[53] Uncertainty-aware Pseudo Label Refinery for Domain Adaptive Semantic Segmentation PDF

Cannot Refute

[54] An uncertainty-guided tiered self-training framework for active source-free domain adaptation in prostate segmentation PDF

Cannot Refute

[55] Semi-supervised active learning for object detection PDF

Cannot Refute

[56] A Facial Expression Recognition Method Integrating Uncertainty Estimation and Active Learning. PDF

Cannot Refute

[57] Active uncertainty representation learning: Toward more label efficiency in deep learning PDF

Cannot Refute

[58] Rice Panicle Segmentation using Multi-Stage Pseudo-labeling and Active Learning PDF

Cannot Refute

[59] Incremental Pedestrian Attribute Recognition via Dual Uncertainty-Aware Pseudo-Labeling PDF

Cannot Refute

[60] nnFilterMatch: A Unified Semi-Supervised Learning Framework with Uncertainty-Aware Pseudo-Label Filtering for Efficient Medical Segmentation PDF

Cannot Refute

PSP: Prompt-Guided Self-Training Sampling Policy for Active Prompt Learning

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[1] Active Prompt Learning in Vision Language Models PDF

Contribution Analysis

Prompt-Guided Self-Training Sampling Policy (PSP) for Active Prompt Learning

[61] Rl-vlm-f: Reinforcement learning from vision language foundation model feedback PDF

[62] Vl-rethinker: Incentivizing self-reflection of vision-language models with reinforcement learning PDF

[63] Fine-tuning large vision-language models as decision-making agents via reinforcement learning PDF

[64] Reason-rft: Reinforcement fine-tuning for visual reasoning of vision language models PDF

[65] Improving vision-language-action model with online reinforcement learning PDF

[66] Few-Shot Vision-Language Reasoning for Satellite Imagery via Verifiable Rewards PDF

[67] RLS3: RL-Based Synthetic Sample Selection to Enhance Spatial Reasoning in Vision-Language Models for Indoor Autonomous Perception PDF

[68] VRAG-RL: Empower Vision-Perception-Based RAG for Visually Rich Information Understanding via Iterative Reasoning with Reinforcement Learning PDF

[69] WeThink: Toward General-purpose Vision-Language Reasoning via Reinforcement Learning PDF

[70] Toward Ambulatory Vision: Learning Visually-Grounded Active View Selection PDF

Vectorized Soft Actor-Critic Sampling Policy (VSSP)

[71] Decentralized policy gradient method for mean-field linear quadratic regulator with global convergence PDF

Uncertainty Augmented Self-Training (UST) mechanism

[52] Uncertainty-guided never-ending learning to drive PDF

[51] Exploiting completeness and uncertainty of pseudo labels for weakly supervised video anomaly detection PDF

[53] Uncertainty-aware Pseudo Label Refinery for Domain Adaptive Semantic Segmentation PDF

[54] An uncertainty-guided tiered self-training framework for active source-free domain adaptation in prostate segmentation PDF

[55] Semi-supervised active learning for object detection PDF

[56] A Facial Expression Recognition Method Integrating Uncertainty Estimation and Active Learning. PDF

[57] Active uncertainty representation learning: Toward more label efficiency in deep learning PDF

[58] Rice Panicle Segmentation using Multi-Stage Pseudo-labeling and Active Learning PDF

[59] Incremental Pedestrian Attribute Recognition via Dual Uncertainty-Aware Pseudo-Labeling PDF

[60] nnFilterMatch: A Unified Semi-Supervised Learning Framework with Uncertainty-Aware Pseudo-Label Filtering for Efficient Medical Segmentation PDF

Table of Contents