Post-hoc Probabilistic Vision-Language Models

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

Uncertainty QuantificationActive Fine-TuningBayesian Deep LearningVision-Language Models

Vision-language models (VLMs), such as CLIP and SigLIP, have found remarkable success in classification, retrieval, and generative tasks. For this, VLMs deterministically map images and text descriptions to a joint latent space in which their similarity is assessed using the cosine similarity. However, a deterministic mapping of inputs fails to capture uncertainties over concepts arising from domain shifts when used in downstream tasks. In this work, we propose post-hoc uncertainty estimation in VLMs that does not require additional training. Our method leverages a Bayesian posterior approximation over the last layers in VLMs and analytically quantifies uncertainties over cosine similarities. We demonstrate its effectiveness for uncertainty quantification and support set selection in active learning. Compared to baselines, we obtain improved and well-calibrated predictive uncertainties, interpretable uncertainty estimates, and sample-efficient active learning. Our results show promise for safety-critical applications of large-scale models.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes BayesVLM, a post-hoc Bayesian uncertainty quantification method for vision-language models using Laplace approximation over final layers. It resides in the 'Post-hoc Probabilistic Embedding Approaches' leaf, which contains only three papers total (including this work and two siblings: Probabilistic Embeddings Frozen and Intra Class Probabilistic). This is a relatively sparse research direction within the broader taxonomy of 50 papers across 36 topics, suggesting the specific combination of post-hoc Bayesian methods and VLM embeddings remains underexplored compared to calibration-focused or training-based probabilistic approaches.

The taxonomy reveals neighboring leaves focused on training-based probabilistic modeling (four papers) and hidden representation-based uncertainty (three papers), indicating alternative strategies for VLM uncertainty. The parent branch 'Uncertainty Estimation Methods and Frameworks' encompasses multiple approaches, while sibling branches address calibration techniques (13 papers across four leaves) and application-specific evaluation (17 papers). The scope note clarifies that post-hoc methods must avoid retraining, distinguishing this work from training-based probabilistic VLMs like those requiring fine-tuning or learned uncertainty predictors. The paper's analytical uncertainty propagation connects to semantic uncertainty quantification approaches but differs by operating directly on embedding distributions rather than output consistency.

Among 28 candidates examined across three contributions, only one refutable pair emerged. The core BayesVLM framework (Contribution 1) examined nine candidates with zero refutations, suggesting limited direct prior work on Laplace-approximated VLM embeddings. Contribution 2 (analytical cosine similarity distributions) examined nine candidates and found one potential overlap, indicating some existing work on uncertainty propagation in similarity metrics. Contribution 3 (active learning demonstrations) examined ten candidates without refutation, though this may reflect the application focus rather than methodological novelty. The limited search scope (top-K semantic matches plus citations) means these statistics capture nearby prior work but not exhaustive field coverage.

Based on the constrained literature search, the work appears to occupy a relatively novel position combining post-hoc Bayesian inference with VLM embeddings. The sparse population of its taxonomy leaf and low refutation rate across contributions suggest incremental overlap with existing methods, though the analytical uncertainty propagation shows some prior exploration. The analysis covers semantically proximate papers but cannot confirm absence of related work in adjacent research communities or recent preprints outside the search scope.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: post-hoc uncertainty quantification for vision-language models. The field addresses how to estimate and calibrate confidence in VLM predictions without retraining from scratch. The taxonomy reveals several complementary directions: Uncertainty Estimation Methods and Frameworks develop techniques to quantify model uncertainty through probabilistic embeddings, Bayesian approaches, and ensemble-like strategies; Confidence Calibration Techniques focus on adjusting model outputs to align predicted confidence with actual accuracy, often via temperature scaling or contrastive methods; Uncertainty-aware Applications and Evaluation explore how uncertainty estimates can guide downstream tasks and benchmarks; Hallucination Detection and Mitigation targets the specific problem of identifying and reducing spurious or unfaithful outputs; and Related Vision-Language Model Topics cover broader VLM concerns such as robustness and modality alignment. Representative works like ProbVLM[11] and Multimodal Uncertainty Encoders[28] illustrate early probabilistic embedding strategies, while Calibrated Robust Finetuning[4] and Attentional Vision Calibration[3] exemplify calibration-focused methods. A particularly active line of work centers on post-hoc probabilistic embedding approaches, which retrofit uncertainty into frozen or minimally adapted VLMs. Posthoc Probabilistic VLM[0] sits squarely in this cluster, alongside Probabilistic Embeddings Frozen[15] and Intra Class Probabilistic[37], all aiming to capture distributional information in embedding space without extensive retraining. This contrasts with calibration-centric methods like Attentional Vision Calibration[3] or Unveiling Uncertainty[5], which adjust confidence scores but may not model full distributional uncertainty. A key trade-off is between computational efficiency and expressiveness: probabilistic embeddings can capture richer uncertainty but require careful design to remain tractable, while calibration techniques are often simpler yet may not address epistemic uncertainty as directly. Open questions include how to best integrate these uncertainty estimates into real-world applications, balance calibration with hallucination mitigation, and evaluate uncertainty quality across diverse VLM architectures and tasks.

Claimed Contributions

BayesVLM: Post-hoc probabilistic vision-language models using Laplace approximation

9 retrieved papers

The authors introduce BayesVLM, a training-free post-hoc uncertainty quantification method for vision-language models that leverages Laplace approximation over the last layers of VLM encoders. This approach enables uncertainty estimation without requiring architectural modifications, retraining, or additional training procedures.

9 retrieved papers

Analytical distribution over cosine similarities for efficient uncertainty propagation

Can Refute

9 retrieved papers

The authors develop a novel Bayesian formulation for VLMs by introducing independent probabilistic models for each modality and deriving a closed-form Gaussian approximation (ProbCosine) of the distribution over cosine similarities. This enables efficient propagation of uncertainties from model parameters to VLM outputs.

9 retrieved papers

Can Refute

Demonstration of BayesVLM effectiveness in zero-shot classification and active learning

10 retrieved papers

The authors empirically validate BayesVLM across multiple benchmarks, showing improved calibration and uncertainty estimates in zero-shot classification tasks, and demonstrating sample-efficient active learning through uncertainty-based data selection using BALD and EPIG acquisition functions.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[15] Probabilistic Embeddings for Frozen Vision-Language Models: Uncertainty Quantification with Gaussian Process Latent Variable Models PDF

Venkataramanan, Aishwarya, Bodesheim, Paul, Aishwarya Venkataramanan, Denzler, Joachim, P. Bodesheim, Joachim Denzler (2025)

[37] Intra-Class Probabilistic Embeddings for Uncertainty Estimation in Vision-Language Models PDF

Zhenxiang Lin, Maryam Haghighat, Will Browne, Dimity Miller (2025)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

BayesVLM: Post-hoc probabilistic vision-language models using Laplace approximation

[24] The role of predictive uncertainty and diversity in embodied ai and robot learning PDF

Cannot Refute

[57] Probabilistic Active Few-Shot Learning in Vision-Language Models PDF

Cannot Refute

[70] Generative Uncertainty in Diffusion Models PDF

Cannot Refute

[71] Learnable Uncertainty under Laplace Approximations PDF

Cannot Refute

[72] Distilling Calibration via Conformalized Credal Inference PDF

Cannot Refute

[73] Mixtures of Laplace Approximations for Improved Post-Hoc Uncertainty in Deep Learning PDF

Cannot Refute

[74] â¦ Better AI: Improving Vision-Language Attention with Probabilistic Adapters: A post hoc probabilistic approach to attention guided learning from frozen vision language â¦ PDF

Cannot Refute

[75] â¦ : Improving Vision-Language Attention with Probabilistic Adapters: A post hoc probabilistic approach to attention guided learning from frozen vision language models PDF

Cannot Refute

[76] Beyond standard benchmarking: towards robust and trustworthy robotics for industrial and nuclear applications PDF

Cannot Refute

Contribution

Analytical distribution over cosine similarities for efficient uncertainty propagation

[57] Probabilistic Active Few-Shot Learning in Vision-Language Models PDF

Can Refute

[51] Robust impact localisation on composite aerostructures using kernel design and Bayesian-inspired model averaging under environmental and operational uncertainties PDF

Cannot Refute

[52] The Document Vectors Using Cosine Similarity Revisited PDF

Cannot Refute

[53] Birds, Bias, and Better AI: Improving Vision-Language Attention with Probabilistic Adapters: A post hoc probabilistic approach to attention guided learning from frozen â¦ PDF

Cannot Refute

[54] Over the Top-1: Uncertainty-Aware Cross-Modal Retrieval with CLIP PDF

Cannot Refute

[55] Collective Bayesian Matrix factorization Hashing for cross-modal retrieval PDF

Cannot Refute

[56] Probabilistic Approaches for Deep Learning: Representation Learning and Uncertainty Estimation PDF

Cannot Refute

[58] Regression and multimodal learning to aid diagnosis in ophthalmology and histopathology PDF

Cannot Refute

[59] SiMAE: Subject-identity Separation Latent Masked Autoencoder for Multi-contrast MRI Synthesis and Uncertainty Estimation PDF

Cannot Refute

Contribution

Demonstration of BayesVLM effectiveness in zero-shot classification and active learning

[60] Active learning for vision-language models PDF

Cannot Refute

[61] Avoid wasted annotation costs in open-set active learning with pre-trained vision-language model PDF

Cannot Refute

[62] Uncertainty-aware learning for zero-shot semantic segmentation PDF

Cannot Refute

[63] UnCo: Uncertainty-Driven Collaborative Framework of Large and Small Models for Grounded Multimodal NER PDF

Cannot Refute

[64] Human-Centric Context and Self-Uncertainty-Driven Multi-Modal Large Language Model for Training-Free Vision-Based Driver State Recognition PDF

Cannot Refute

[65] Multimodal llm guided exploration and active mapping using fisher information PDF

Cannot Refute

[66] AcZeroTS: Active Learning for Zero-shot Tissue Segmentation in Pathology Images PDF

Cannot Refute

[67] Towards Visual Explainable Active Learning for Zero-Shot Classification PDF

Cannot Refute

[68] Learning with Uncertainty via Hyperbolic Neural Networks PDF

Cannot Refute

[69] Active Few Shot Learning for Histopathology PDF

Cannot Refute

Post-hoc Probabilistic Vision-Language Models

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[15] Probabilistic Embeddings for Frozen Vision-Language Models: Uncertainty Quantification with Gaussian Process Latent Variable Models PDF

[37] Intra-Class Probabilistic Embeddings for Uncertainty Estimation in Vision-Language Models PDF

Contribution Analysis

BayesVLM: Post-hoc probabilistic vision-language models using Laplace approximation

[24] The role of predictive uncertainty and diversity in embodied ai and robot learning PDF

[57] Probabilistic Active Few-Shot Learning in Vision-Language Models PDF

[70] Generative Uncertainty in Diffusion Models PDF

[71] Learnable Uncertainty under Laplace Approximations PDF

[72] Distilling Calibration via Conformalized Credal Inference PDF

[73] Mixtures of Laplace Approximations for Improved Post-Hoc Uncertainty in Deep Learning PDF

[74] â¦ Better AI: Improving Vision-Language Attention with Probabilistic Adapters: A post hoc probabilistic approach to attention guided learning from frozen vision language â¦ PDF

[75] â¦ : Improving Vision-Language Attention with Probabilistic Adapters: A post hoc probabilistic approach to attention guided learning from frozen vision language models PDF

[76] Beyond standard benchmarking: towards robust and trustworthy robotics for industrial and nuclear applications PDF

Analytical distribution over cosine similarities for efficient uncertainty propagation

[57] Probabilistic Active Few-Shot Learning in Vision-Language Models PDF

[51] Robust impact localisation on composite aerostructures using kernel design and Bayesian-inspired model averaging under environmental and operational uncertainties PDF

[52] The Document Vectors Using Cosine Similarity Revisited PDF

[53] Birds, Bias, and Better AI: Improving Vision-Language Attention with Probabilistic Adapters: A post hoc probabilistic approach to attention guided learning from frozen â¦ PDF

[54] Over the Top-1: Uncertainty-Aware Cross-Modal Retrieval with CLIP PDF

[55] Collective Bayesian Matrix factorization Hashing for cross-modal retrieval PDF

[56] Probabilistic Approaches for Deep Learning: Representation Learning and Uncertainty Estimation PDF

[58] Regression and multimodal learning to aid diagnosis in ophthalmology and histopathology PDF

[59] SiMAE: Subject-identity Separation Latent Masked Autoencoder for Multi-contrast MRI Synthesis and Uncertainty Estimation PDF

Demonstration of BayesVLM effectiveness in zero-shot classification and active learning

[60] Active learning for vision-language models PDF

[61] Avoid wasted annotation costs in open-set active learning with pre-trained vision-language model PDF

[62] Uncertainty-aware learning for zero-shot semantic segmentation PDF

[63] UnCo: Uncertainty-Driven Collaborative Framework of Large and Small Models for Grounded Multimodal NER PDF

[64] Human-Centric Context and Self-Uncertainty-Driven Multi-Modal Large Language Model for Training-Free Vision-Based Driver State Recognition PDF

[65] Multimodal llm guided exploration and active mapping using fisher information PDF

[66] AcZeroTS: Active Learning for Zero-shot Tissue Segmentation in Pathology Images PDF

[67] Towards Visual Explainable Active Learning for Zero-Shot Classification PDF

[68] Learning with Uncertainty via Hyperbolic Neural Networks PDF

[69] Active Few Shot Learning for Histopathology PDF

Table of Contents

[74] â¦ Better AI: Improving Vision-Language Attention with Probabilistic Adapters: A post hoc probabilistic approach to attention guided learning from frozen vision language â¦ PDF

[75] â¦ : Improving Vision-Language Attention with Probabilistic Adapters: A post hoc probabilistic approach to attention guided learning from frozen vision language models PDF

[53] Birds, Bias, and Better AI: Improving Vision-Language Attention with Probabilistic Adapters: A post hoc probabilistic approach to attention guided learning from frozen â¦ PDF