Frozen Priors, Fluid Forecasts: Prequential Uncertainty for Low-Data Deployment with Pretrained Generative Models

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.5 Download Report PDF

predictive uncertainty quantificationprequential inferencemeasure-valued martingalesfrozen generative models

Deploying ML systems with only a few real samples makes operational metrics (such as alert rates or mean scores) highly unstable. Existing uncertainty quantification (UQ) methods fail here: frequentist intervals ignore the deployed predictive rule, Bayesian posteriors assume continual refitting, and conformal methods offer per-example rather than long-run guarantees. We introduce a forecast-first UQ framework that blends the empirical distribution with a frozen pretrained generator using a unique Dirichlet schedule, ensuring time-consistent forecasts. Uncertainty is quantified via martingale posteriors: a lightweight, likelihood-free resampling method that simulates future forecasts under the deployed rule, yielding sharp, well-calibrated intervals for both current and long-run metrics without retraining or density evaluation. A single hyperparameter, set by a small- $n$ minimax criterion, balances sampling variance and model--data mismatch; for bounded scores, we provide finite-time drift guarantees. We also show how this framework informs optimal retraining decisions. Applicable off-the-shelf to frozen generators (flows, diffusion, autoregressive models, GANs) and linear metrics (means, tails, NLL), it outperforms bootstrap baselines across vision and language benchmarks (WikiText-2, CIFAR-10, and SVHN datasets); e.g., it achieves $\sim$ 90% coverage on GPT-2 with 20 samples vs.\ 37% for bootstrap. Importantly, our uncertainty estimates are operational under the deployed forecasting rule agnostic of the population parameters, affording practicable estimators for deployment in real world settings.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces a forecast-first uncertainty quantification framework for operational metrics when deploying frozen pretrained generators with limited real samples. It resides in the 'Post-Hoc Uncertainty Estimation via Auxiliary Models' leaf, which contains only three papers total. This leaf sits within the broader 'Uncertainty Quantification Frameworks for Frozen Pretrained Models' branch, indicating a relatively sparse research direction focused on attaching auxiliary models to frozen networks. The sibling papers address input-output conditioned uncertainty and probabilistic prototype calibration, suggesting the leaf covers diverse post-hoc strategies but remains underpopulated compared to generative model-based branches.

The taxonomy reveals neighboring leaves in 'Evidential and Meta-Learning Approaches' and 'Pretrained Uncertainty Modules,' both emphasizing meta-learned or evidential reasoning over frozen representations. The paper diverges from these by focusing on operational metrics and martingale posteriors rather than evidential frameworks or transfer learning. Adjacent branches like 'Generative Model-Based Uncertainty Estimation' contain substantially more papers (diffusion, GAN, Bayesian methods), indicating that generative-centric uncertainty is a more crowded area. The paper's emphasis on operational forecasting and time-consistent guarantees distinguishes it from these generative-focused directions, which typically target per-example or reconstruction uncertainty.

Among nine candidates examined across three contributions, none were flagged as clearly refuting the work. The prequential forecasting framework with Dirichlet blending examined one candidate with no refutation. The martingale posterior method examined five candidates, all non-refutable or unclear. The minimax hyperparameter criterion examined three candidates, again with no refutations. This limited search scope—nine papers total—suggests the analysis captures a narrow semantic neighborhood rather than exhaustive prior work. The absence of refutations within this small sample indicates the specific combination of martingale posteriors, Dirichlet blending, and operational metric forecasting may be underexplored, though broader literature beyond these nine candidates remains unexamined.

Given the sparse taxonomy leaf and limited search scope, the work appears to occupy a relatively novel position within post-hoc uncertainty estimation for frozen models. However, the analysis explicitly covers only top-K semantic matches and does not claim exhaustive coverage. The framework's integration of prequential forecasting, martingale posteriors, and operational metrics may represent a distinctive synthesis, but definitive novelty assessment would require examining a larger candidate pool and exploring connections to adjacent fields like online learning or sequential decision-making under uncertainty.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: uncertainty quantification for operational metrics with frozen pretrained generators. The field addresses how to estimate predictive uncertainty when leveraging large pretrained models without retraining them, a practical constraint in many deployment scenarios. The taxonomy reveals several complementary directions: some branches focus on general frameworks for post-hoc uncertainty estimation via auxiliary models or ensemble-like methods, while others emphasize generative model-based approaches that exploit the stochastic nature of diffusion models or GANs. A third cluster explores uncertainty-aware learning with vision-language and multimodal models, adapting pretrained representations to downstream tasks while quantifying confidence. Additional branches address active learning strategies that use uncertainty to guide data collection, domain-specific applications ranging from medical imaging to geophysics, and data augmentation techniques that generate synthetic samples to probe model reliability. Representative works such as Generative Model Uncertainty Imaging[1] and Conditional GAN Uncertainty[9] illustrate how generative architectures naturally produce distributional outputs, while Pretrained Models Virtual Metrology[2] and BayesCap[12] demonstrate auxiliary modeling strategies. A particularly active line of work centers on post-hoc methods that attach lightweight uncertainty estimators to frozen backbones, balancing computational efficiency with calibration quality. Frozen Priors Fluid Forecasts[0] sits within this branch, sharing methodological kinship with Input Output Conditioned Uncertainty[4] and Probabilistic Prototype Calibration[3], which similarly avoid fine-tuning the base model. Compared to Input Output Conditioned Uncertainty[4], which conditions auxiliary networks on both inputs and outputs, Frozen Priors Fluid Forecasts[0] emphasizes operational metrics—quantities directly tied to decision-making in deployment contexts. Meanwhile, Probabilistic Prototype Calibration[3] focuses on prototype-based representations, whereas Frozen Priors Fluid Forecasts[0] targets broader operational forecasting scenarios. These distinctions highlight ongoing trade-offs between generality, computational overhead, and the granularity of uncertainty estimates, with open questions around how to best propagate uncertainty from pretrained features to task-specific metrics without sacrificing the efficiency gains that motivate freezing the generator in the first place.

Claimed Contributions

Prequential forecasting framework with Dirichlet blending for frozen generative models

1 retrieved paper

The authors propose a prequential forecasting approach that blends empirical data with a fixed pretrained generator using a Dirichlet-style schedule (λ_i = α/(i+α)). They prove this is the unique affine combination ensuring time-consistent forecasts, making the sequence of forecasted functionals form a martingale.

1 retrieved paper

Martingale posterior method for uncertainty quantification without retraining

5 retrieved papers

The authors develop a martingale posterior approach that quantifies uncertainty by simulating future forecasts under the deployed blending rule. This method provides calibrated predictive intervals for operational metrics without requiring model retraining or likelihood evaluation.

5 retrieved papers

Minimax criterion for hyperparameter selection in low-data regime

3 retrieved papers

The authors provide a principled method for selecting the hyperparameter α by formulating a small-sample minimax problem that explicitly trades off sampling variance against model-data mismatch. This yields a closed-form expression α* = σ²/Δ² that is independent of sample size.

3 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[4] Principled Input-Output-Conditioned Post-Hoc Uncertainty Estimation for Regression Networks PDF

Bramlage, Lennart, Curio, CristÃ³bal, Lennart Bramlage, CristÃ³bal Curio (2025) • arXiv.org

[12] BayesCap: Bayesian Identity Cap for Calibrated Uncertainty in Frozen Neural Networks PDF

Upadhyay, Uddeshya, Uddeshya Upadhyay, Karthik, Shyamgopal, Shyamgopal Karthik, Chen Yanbei, Yanbei Chen, Mancini, Massimiliano, Massimiliano Mancini, Akata, Zeynep, Zeynep Akata (2022) • European Conference on Computer Vision

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Prequential forecasting framework with Dirichlet blending for frozen generative models

[41] Human Activity Recognition with an HMM-Based Generative Model PDF

Cannot Refute

Contribution

Martingale posterior method for uncertainty quantification without retraining

[36] Asymptotics for parametric martingale posteriors PDF

Cannot Refute

[37] Moment Martingale Posteriors for Semiparametric Predictive Bayes PDF

Cannot Refute

[38] Towards the Uncertainty-aware Geospatial Artificial Intelligence PDF

Cannot Refute

[39] Alternative formats PDF

Cannot Refute

[40] Test Time Scaling for Neural Processes PDF

Cannot Refute

Contribution

Minimax criterion for hyperparameter selection in low-data regime

[42] Lookbehind-SAM: k steps back, 1 step forward PDF

Cannot Refute

[43] Locally adaptive label smoothing improves predictive churn PDF

Cannot Refute

[44] Enhanced Balancing of Bias-Variance Tradeoff in Stochastic Estimation: A Minimax Perspective PDF

Cannot Refute

Frozen Priors, Fluid Forecasts: Prequential Uncertainty for Low-Data Deployment with Pretrained Generative Models

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[4] Principled Input-Output-Conditioned Post-Hoc Uncertainty Estimation for Regression Networks PDF

[12] BayesCap: Bayesian Identity Cap for Calibrated Uncertainty in Frozen Neural Networks PDF

Contribution Analysis

Prequential forecasting framework with Dirichlet blending for frozen generative models

[41] Human Activity Recognition with an HMM-Based Generative Model PDF

Martingale posterior method for uncertainty quantification without retraining

[36] Asymptotics for parametric martingale posteriors PDF

[37] Moment Martingale Posteriors for Semiparametric Predictive Bayes PDF

[38] Towards the Uncertainty-aware Geospatial Artificial Intelligence PDF

[39] Alternative formats PDF

[40] Test Time Scaling for Neural Processes PDF

Minimax criterion for hyperparameter selection in low-data regime

[42] Lookbehind-SAM: k steps back, 1 step forward PDF

[43] Locally adaptive label smoothing improves predictive churn PDF

[44] Enhanced Balancing of Bias-Variance Tradeoff in Stochastic Estimation: A Minimax Perspective PDF

Table of Contents