Thompson Sampling via Fine-Tuning of LLMs
Overview
Overall Novelty Assessment
The paper proposes Thompson Sampling via Fine-Tuning (ToSFiT), which combines Thompson sampling with large language model fine-tuning to optimize over unstructured discrete spaces. It resides in the 'Thompson Sampling and Posterior Sampling' leaf, which contains only two papers total. This is a notably sparse research direction within the broader taxonomy of fifty papers, suggesting that posterior sampling approaches remain relatively underexplored compared to surrogate modeling or gradient-free acquisition optimization methods that dominate other branches.
The taxonomy reveals that most acquisition function optimization work concentrates on gradient-free reparameterization techniques or game-theoretic frameworks, both appearing in sibling leaves under the same parent category. The broader 'Acquisition Function Optimization and Search Strategies' branch sits alongside 'Surrogate Modeling and Representation Learning', which contains four distinct subcategories with substantially more papers. ToSFiT diverges from representation-learning approaches by avoiding explicit embeddings or latent spaces, instead directly parameterizing candidate selection probabilities through language model adaptation.
Among thirty candidates examined, none clearly refute the three core contributions. The theoretical regret bound for variational Bayesian optimization accounting for reward correlation was examined against ten candidates with zero refutations. The ToSFiT algorithm combining pre-training with posterior fine-tuning similarly showed no overlapping prior work among ten examined papers. The empirical validation across FAQ refinement, protein search, and quantum circuit design also encountered no refutable candidates in its ten-paper examination. These statistics suggest novelty within the limited search scope, though the small candidate pool leaves open the possibility of relevant work beyond top-thirty semantic matches.
The analysis indicates that ToSFiT occupies a sparsely populated research direction, with limited prior work on Thompson sampling for discrete Bayesian optimization. The absence of refutations across all contributions within the examined candidate set supports perceived novelty, though the thirty-paper scope represents a focused rather than exhaustive literature review. The taxonomy context confirms that posterior sampling remains less developed than alternative acquisition strategies in this domain.
Taxonomy
Research Landscape Overview
Claimed Contributions
The authors derive a tighter regret bound for Variational Bayesian Optimistic Sampling that scales with the maximal information gain γT rather than the domain size |X|, making it applicable to large discrete spaces. They also extend this bound to approximate VBOS implementations using gradient-based optimization.
The authors propose Thompson Sampling via Fine-Tuning (TOSFIT), which initializes a generative policy using pre-trained LLMs with prompt conditioning and then carefully adapts it toward the posterior probability of maximality through fine-tuning, guided by their theoretical analysis.
The authors demonstrate TOSFIT's effectiveness on three distinct tasks spanning natural language, biological sequences, and quantum circuits, showing it achieves state-of-the-art sample efficiency and computational efficiency compared to existing Bayesian optimization, reinforcement learning, and evolutionary methods.
Core Task Comparisons
Comparisons with papers in the same taxonomy category
[12] Amortized bayesian optimization over discrete spaces PDF
Contribution Analysis
Detailed comparisons for each claimed contribution
Improved regret bound for VBOS accounting for reward correlation
The authors derive a tighter regret bound for Variational Bayesian Optimistic Sampling that scales with the maximal information gain γT rather than the domain size |X|, making it applicable to large discrete spaces. They also extend this bound to approximate VBOS implementations using gradient-based optimization.
[25] Bayesian Optimization over Discrete and Mixed Spaces via Probabilistic Reparameterization PDF
[51] CO-BED: Information-Theoretic Contextual Optimization via Bayesian Experimental Design PDF
[52] Human-in-the-loop Bayesian Optimization with No-Regret Guarantees PDF
[53] Regret and belief complexity trade-off in Gaussian process bandits via information thresholding PDF
[54] First-Order Bayesian Regret Analysis of Thompson Sampling PDF
[55] Transportability for bandits with data from different environments PDF
[56] An Information-Theoretic Approach to Bandits and Reinforcement Learning PDF
[57] Regret Bounds for Information-Directed Reinforcement Learning PDF
[58] Bayesian optimization in the wild: risk-averse and computationally-effective decision-making PDF
[59] Exploring and Exploiting Model Uncertainty in Bayesian Optimization PDF
TOSFIT algorithm combining pre-training with fine-tuning toward posterior PoM
The authors propose Thompson Sampling via Fine-Tuning (TOSFIT), which initializes a generative policy using pre-trained LLMs with prompt conditioning and then carefully adapts it toward the posterior probability of maximality through fine-tuning, guided by their theoretical analysis.
[60] On the importance of uncertainty in decision-making with large language models PDF
[61] A Systematic Review on Optimization Approaches for Transformer and Large Language Models PDF
[62] Searching for optimal solutions with LLMs via bayesian optimization PDF
[63] Deep bayesian active learning for preference modeling in large language models PDF
[64] Posterior sampling via autoregressive generation PDF
[65] Code repair with llms gives an exploration-exploitation tradeoff PDF
[66] Retrieval-Native Language Models: Integrating Parametric and Vector Memory with Bayesian Attention PDF
[67] Knowledge-based question answering with large language models PDF
[68] LMFuzz: Program repair fuzzing based on large language models PDF
[69] Multi-armed bandits for resource efficient, online optimization of language model pre-training: the use case of dynamic masking PDF
Empirical validation across three diverse optimization tasks
The authors demonstrate TOSFIT's effectiveness on three distinct tasks spanning natural language, biological sequences, and quantum circuits, showing it achieves state-of-the-art sample efficiency and computational efficiency compared to existing Bayesian optimization, reinforcement learning, and evolutionary methods.