Efficient Simple Regret Algorithms for Stochastic Contextual Bandits

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

contextual banditslogistic banditssimple regretThompson sampling

We study stochastic contextual logistic bandits under the simple regret objective. While simple regret guarantees are known for the linear case, no such results existed for the logistic setting. Building on ideas from contextual linear bandits and self-concordant analysis, we propose the first algorithm that achieves simple regret $\tilde{\mathcal{O}}(d/\sqrt{T})$ . Notably, the leading term of our regret bound is free of $\kappa=\mathcal O(\exp(S))$ , where $S$ is a bound on the magnitude of the unknown parameter vector, while the algorithm remains computationally tractable for finite action sets. We also introduce a new variant of Thompson Sampling adapted to the simple-regret setting, which yields the first simple regret guarantee for randomized algorithms in stochastic contextual linear bandits. Extending these tools to the logistic case, we obtain a Thompson Sampling variant with regret $\tilde{\mathcal O}(d^{3/2}/\sqrt{T})$ , again free of $\kappa$ in the leading term. The randomized algorithms, as expected, are cheaper to run than their deterministic counterparts. Finally, we conducted a series of experiments to empirically validate these theoretical guarantees.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper contributes the first simple regret algorithm for stochastic contextual logistic bandits, achieving Õ(d/√T) regret with a leading term free of the exponential parameter κ. It also introduces a Thompson Sampling variant for simple regret in contextual linear bandits, extending to the logistic case with Õ(d^(3/2)/√T) regret. Within the taxonomy, the work sits in the 'Deterministic Simple Regret Algorithms' leaf alongside one sibling paper (fc11b0c3a4fe1e4e7e3fff4a08148db3), indicating a relatively sparse research direction focused on provable simple regret bounds for contextual bandits under stochastic rewards.

The taxonomy reveals that the broader 'Core Simple Regret Algorithms for Contextual Bandits' branch contains three leaves: deterministic methods, randomized approaches, and Bayesian experimental design. The paper bridges deterministic and randomized categories by proposing both types of algorithms. Neighboring branches address structural assumptions (e.g., Lipschitz rewards, adaptive contexts) and dual-objective frameworks balancing simple and cumulative regret. The scope notes clarify that this work excludes cumulative regret methods and non-contextual settings, positioning it firmly within pure simple regret minimization for contextual bandits with stochastic feedback.

Among seventeen candidates examined across three contributions, none were found to clearly refute the claimed novelty. The first contribution (logistic contextual bandits) examined one candidate with no refutation. The second contribution (randomized simple regret for linear contextual bandits) examined six candidates, all non-refutable or unclear. The third contribution (novel analysis paradigm) examined ten candidates, again with no refutations. This suggests that within the limited search scope—top-K semantic matches plus citation expansion—the paper's contributions appear distinct from existing work, though the small candidate pool (especially one candidate for the logistic claim) limits the strength of this signal.

Based on the limited literature search, the work appears to occupy a genuinely sparse area: simple regret for logistic contextual bandits and randomized simple regret for linear contextual bandits both lack direct prior results among the examined candidates. The taxonomy structure confirms that simple regret minimization itself is less crowded than cumulative regret research. However, the analysis covers only seventeen candidates, not an exhaustive survey, so stronger novelty claims would require broader verification beyond top-K semantic retrieval.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: simple regret minimization in stochastic contextual bandits. The field centers on identifying the best arm or policy after a fixed exploration budget, rather than minimizing cumulative regret during learning. The taxonomy reveals several main branches: core algorithmic approaches that directly tackle simple regret (including deterministic and randomized strategies), structural assumptions and problem variants that exploit specific reward or context structures, dual-objective frameworks that balance simple regret with cumulative regret or other constraints, extensions to infinite or continuous arm spaces, connections to active learning and policy optimization, and broader bandit theory surveys. Representative works such as Efficient Simple Regret Algorithms[0] and Simple Regret Minimization for[1] illustrate how deterministic elimination-style methods can achieve strong guarantees, while other branches like Minimal Exploration in Structured[3] or Causal Contextual Bandits with[4] show how additional structure (e.g., causal graphs or low-dimensional embeddings) can reduce sample complexity. A particularly active line of research contrasts pure simple regret methods with hybrid approaches that must also control cumulative regret or satisfy resource constraints, as seen in works like Bandits with Knapsacks beyond[14] or Multi-Armed Bandit With Several[12]. Another theme involves extending finite-arm guarantees to infinite or continuous settings, exemplified by Simple regret for infinitely[2], which raises open questions about discretization and adaptive sampling. Within this landscape, Efficient Simple Regret Algorithms[0] sits squarely in the deterministic core branch alongside Simple Regret Minimization for[1], emphasizing tight sample complexity bounds and elimination-based exploration. Compared to neighbors like Minimal Exploration in Structured[3], which leverages problem structure, or randomized approaches such as OSOM[10], the deterministic focus of Efficient Simple Regret Algorithms[0] prioritizes worst-case guarantees and algorithmic simplicity, reflecting a classical yet foundational perspective on the simple regret objective.

Claimed Contributions

First simple regret algorithm for logistic contextual bandits

1 retrieved paper

The authors introduce the first algorithm for stochastic contextual logistic bandits under the simple regret objective, achieving Õ(d/√T) regret with the leading term independent of the potentially exponentially large constant κ. The algorithm uses uncertainty-based action selection with self-concordant analysis.

1 retrieved paper

First simple regret guarantee for randomized algorithms in contextual linear bandits

6 retrieved papers

The authors propose a Thompson Sampling variant specifically designed for simple regret in stochastic contextual linear bandits, providing the first theoretical guarantee for randomized algorithms in this setting with Õ(d^(3/2)/√T) regret bound.

6 retrieved papers

Novel analysis paradigm for randomized simple regret algorithms

10 retrieved papers

The authors develop a new analytical framework for studying randomized algorithms under simple regret objectives in contextual bandits, which differs fundamentally from cumulative regret analysis and could serve as a foundation for analyzing related problems.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[1] Simple Regret Minimization for Contextual Bandits PDF

Deshmukh, Aniket Anand, Sharma, Srinagesh, A. Deshmukh, Cutler, James W., Srinagesh Sharma, Moldwin, Mark, J. Cutler, Scott, Clayton, M. Moldwin, C. Scott (2022)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

First simple regret algorithm for logistic contextual bandits

[22] An Experimental Design Approach for Regret Minimization in Logistic Bandits PDF

Cannot Refute

Contribution

First simple regret guarantee for randomized algorithms in contextual linear bandits

[9] Proportional Response: Contextual Bandits for Simple and Cumulative Regret Minimization PDF

Cannot Refute

[16] Planning contextual adaptive experiments with model predictive control PDF

Cannot Refute

[18] Nicolas Galichet Contributions to Multi-Armed Bandits: Risk-Awareness and Sub-Sampling for Linear Contextual Bandits PDF

Cannot Refute

[19] On adaptivity and confounding in contextual bandit experiments PDF

Cannot Refute

[20] Assessing Dataset Quality using Optimal Experimental Design for Linear Contextual Bandits PDF

Cannot Refute

[21] Bandits and Preference Learning PDF

Cannot Refute

Contribution

Novel analysis paradigm for randomized simple regret algorithms

[4] Causal Contextual Bandits with Adaptive Context PDF

Cannot Refute

[5] Simple regret minimization for contextual bandits using bayesian optimal experimental design PDF

Cannot Refute

[9] Proportional Response: Contextual Bandits for Simple and Cumulative Regret Minimization PDF

Cannot Refute

[10] OSOM: A Simultaneously Optimal Algorithm for Multi-Armed and Linear Contextual Bandits PDF

Cannot Refute

[11] Contextual Bandits in a Survey Experiment on Charitable Giving: Within-Experiment Outcomes versus Policy Learning PDF

Cannot Refute

[13] Active Learning for Stochastic Contextual Linear Bandits PDF

Cannot Refute

[15] No-Regret Mâ®-Concave Function Maximization: Stochastic Bandit Algorithms and NP-Hardness of Adversarial Full-Information Setting PDF

Cannot Refute

[16] Planning contextual adaptive experiments with model predictive control PDF

Cannot Refute

[17] Lower Bounds on Regret for Noisy Gaussian Process Bandit Optimization PDF

Cannot Refute

[18] Nicolas Galichet Contributions to Multi-Armed Bandits: Risk-Awareness and Sub-Sampling for Linear Contextual Bandits PDF

Cannot Refute

Efficient Simple Regret Algorithms for Stochastic Contextual Bandits

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[1] Simple Regret Minimization for Contextual Bandits PDF

Contribution Analysis

First simple regret algorithm for logistic contextual bandits

[22] An Experimental Design Approach for Regret Minimization in Logistic Bandits PDF

First simple regret guarantee for randomized algorithms in contextual linear bandits

[9] Proportional Response: Contextual Bandits for Simple and Cumulative Regret Minimization PDF

[16] Planning contextual adaptive experiments with model predictive control PDF

[18] Nicolas Galichet Contributions to Multi-Armed Bandits: Risk-Awareness and Sub-Sampling for Linear Contextual Bandits PDF

[19] On adaptivity and confounding in contextual bandit experiments PDF

[20] Assessing Dataset Quality using Optimal Experimental Design for Linear Contextual Bandits PDF

[21] Bandits and Preference Learning PDF

Novel analysis paradigm for randomized simple regret algorithms

[4] Causal Contextual Bandits with Adaptive Context PDF

[5] Simple regret minimization for contextual bandits using bayesian optimal experimental design PDF

[9] Proportional Response: Contextual Bandits for Simple and Cumulative Regret Minimization PDF

[10] OSOM: A Simultaneously Optimal Algorithm for Multi-Armed and Linear Contextual Bandits PDF

[11] Contextual Bandits in a Survey Experiment on Charitable Giving: Within-Experiment Outcomes versus Policy Learning PDF

[13] Active Learning for Stochastic Contextual Linear Bandits PDF

[15] No-Regret Mâ®-Concave Function Maximization: Stochastic Bandit Algorithms and NP-Hardness of Adversarial Full-Information Setting PDF

[16] Planning contextual adaptive experiments with model predictive control PDF

[17] Lower Bounds on Regret for Noisy Gaussian Process Bandit Optimization PDF

[18] Nicolas Galichet Contributions to Multi-Armed Bandits: Risk-Awareness and Sub-Sampling for Linear Contextual Bandits PDF

Table of Contents

[15] No-Regret Mâ®-Concave Function Maximization: Stochastic Bandit Algorithms and NP-Hardness of Adversarial Full-Information Setting PDF