Breaking the Total Variance Barrier: Sharp Sample Complexity for Linear Heteroscedastic Bandits with Fixed Action Set

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.7 Download Report PDF

linear banditsheteroscedastic noisesimple regret

Recent years have witnessed increasing interests in tackling heteroscedastic noise in bandits and reinforcement learning \citep{zhou2021nearly, zhao2023variance, jia2024does, pacchiano2025second}. In these works, the cumulative variance of the noise $\Lambda = \sum_{t=1}^T \sigma_t^2$ , where $\sigma_t^2$ is the variance of the noise at round $t$ , has been used to characterize the statistical complexity of the problem, yielding simple regret bounds of order $\tilde{\mathcal O}(d \sqrt{\Lambda / T^2})$ for linear bandits with heteroscedastic noise \citep{zhou2021nearly, zhao2023variance}. However, with a closer look, $\Lambda$ remains the same order even if the noise is close to zero at half of the rounds, which indicates that the $\Lambda$ -dependence is not optimal.

In this paper, we revisit the linear bandit problem with heteroscedastic noise. We consider the setting where the action set is fixed throughout the learning process. We propose a novel variance-adaptive algorithm VAEE (Variance-Aware Exploration with Elimination) for large action set, which actively explores actions that maximizes the information gain among a candidate set of actions that are not eliminated. With the active-exploration strategy, we show that VAEE achieves a simple regret with a nearly harmonic-mean dependent rate, i.e. $\tilde{\mathcal O}\Big(d\Big[\sum_{t = 1}^T \frac{1}{\sigma_t^2} - \sum_{i = 1}^{\tilde{O}(d)} \frac{1}{[\sigma^{(i)}]^2} \Big]^{-\frac{1}{2}}\Big)$ where $d$ is the dimension of the feature space and $\sigma^{(i)}$ is the $i$ -th smallest variance among $\\{\sigma_t\\}_{t=1}^T$ . For finitely many actions, we propose a variance-aware variant of G-optimal design based exploration, which achieves a $\tilde {\mathcal O}$ $\bigg(\sqrt{d \log |\mathcal A| }\Big[ \sum\_{t = 1}\^T \frac{1}{\sigma\_t\^2}- \sum\_{i = 1}^{\tilde{O}(d)} \frac{1}{[\sigma^{(i)}]^2} \Big]^{-\frac{1}{2}}\bigg)$ simple regret bound. We also establish a nearly matching lower bound for the fixed action set setting indicating that \emph{harmonic-mean} dependent rate is unavoidable. To the best of our knowledge, this is the first work that breaks the $\sqrt{\Lambda}$ barrier for linear bandits with heteroscedastic noise.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes VAEE, a variance-adaptive algorithm for linear bandits with heteroscedastic noise in fixed action sets. It resides in the 'Heteroscedastic Noise Adaptation with Fixed Action Sets' leaf, which contains four papers total including this work. This leaf sits within the broader 'Variance-Adaptive Exploration Mechanisms' branch, indicating a moderately populated research direction focused on action-dependent noise variance. The paper's core claim is that existing cumulative variance bounds (Λ-dependence) are suboptimal and proposes a harmonic-mean-based characterization for tighter regret guarantees.

The taxonomy reveals neighboring leaves addressing variance-aware confidence sets, Bayesian approaches, and neural network enhancements, all within the same parent branch. Adjacent branches tackle non-stationary environments and noise dependency modeling, suggesting the field has diversified into temporal dynamics and correlation structures. The paper's focus on fixed action sets distinguishes it from time-varying environment methods in sibling branches, while its theoretical emphasis on regret bounds connects it to confidence-set construction techniques explored in parallel leaves. The scope notes clarify that methods without explicit variance incorporation belong elsewhere, positioning this work firmly in the variance-adaptive core.

Among ten candidates examined across three contributions, no refutable prior work was identified. Contribution A (VAEE algorithm) examined four candidates with zero refutations; Contribution B (G-optimal design exploration) examined six candidates, also with zero refutations; Contribution C (lower bound) was not evaluated against prior work. This limited search scope—covering roughly one-third of the taxonomy's twenty-six papers—suggests the analysis captures immediate neighbors but may not reflect the full landscape. The absence of refutations among examined candidates indicates the specific algorithmic mechanisms and theoretical characterizations appear distinct within this sample.

Based on the top-ten semantic matches examined, the work appears to occupy a recognizable but not densely crowded position within variance-adaptive linear bandits. The harmonic-mean variance characterization and elimination-based exploration strategy differentiate it from the four sibling papers in its leaf, though the limited search scope prevents definitive claims about novelty relative to the broader field. The taxonomy structure suggests this is an active area with multiple complementary approaches, and the paper's theoretical contributions address a specific gap in variance-dependent regret analysis.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: variance-adaptive exploration in linear bandits with heteroscedastic noise. The field addresses sequential decision-making under uncertainty when reward noise varies across actions or contexts, requiring algorithms that adapt their exploration strategies to heterogeneous variance structures. The taxonomy organizes research into several main branches: Variance-Adaptive Exploration Mechanisms develop core algorithmic principles for handling action-dependent noise, often through confidence-bound adjustments or information-directed sampling; Non-Stationary and Time-Varying Environments extend these ideas to settings where reward distributions drift over time; Noise Structure and Dependency Modeling examines correlated or non-iid noise patterns; Specialized Contextual and Structural Settings incorporate domain-specific constraints such as graph structures or latent representations; and Application-Driven Bandit Methods translate theoretical insights into practical domains like dynamic pricing and recommendation systems. Representative works include Heteroscedastic IDS[2], which pioneered information-theoretic approaches to variance adaptation, and LNUCB-TA[1], which combines variance-aware confidence bounds with time-adaptive mechanisms. A particularly active line of work focuses on designing tight, instance-dependent confidence intervals that scale with local noise levels, as seen in Noise Adaptive Confidence[8] and Adaptive Heteroscedastic Learning[12], which refine exploration bonuses to avoid over-exploration in low-variance regions. Meanwhile, several studies tackle the interplay between nonstationarity and heteroscedasticity, such as Variance Dependent Nonstationary[5] and Nonstationary Latent Bandits[3], exploring how variance adaptation must account for temporal drift. The original paper, Heteroscedastic Bandits[0], sits squarely within the Variance-Adaptive Exploration Mechanisms branch, specifically addressing heteroscedastic noise adaptation with fixed action sets. Its emphasis on principled variance estimation and regret bounds aligns closely with Noise Adaptive Confidence[8] and Adaptive Heteroscedastic Learning[12], though it appears to focus more directly on theoretical guarantees for fixed-arm scenarios rather than the time-varying or contextual extensions explored by neighbors like Nonstationary Latent Bandits[3].

Claimed Contributions

Variance-Aware Exploration with Elimination (VAEE) algorithm for large action sets

4 retrieved papers

The authors introduce VAEE, a variance-adaptive algorithm designed for linear bandits with large or infinite action sets. The algorithm maintains a candidate set of promising actions and actively explores those that maximize information gain subject to elimination rules, achieving a simple regret bound with nearly harmonic-mean dependent rate on the noise variances.

4 retrieved papers

Variance-aware G-optimal design based exploration for finite action sets

6 retrieved papers

The authors develop a variance-adaptive variant of G-optimal design based exploration specifically for finite action sets. This approach achieves improved simple regret bounds with better dependence on the dimension d compared to the infinite action set case, replacing the √d factor with √(d log |A|).

6 retrieved papers

Nearly matching lower bound demonstrating harmonic-mean dependence is unavoidable

0 retrieved papers

The authors prove a nearly matching instance-dependent lower bound for the fixed action set setting, showing that the harmonic-mean dependence on noise variances is fundamental to the problem. This establishes that their algorithms are essentially optimal in their variance dependence.

0 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[2] Information directed sampling and bandits with heteroscedastic noise PDF

Kirschner, Johannes, Krause, Andreas, Johannes Kirschner, Andreas Krause (2018)

[8] Noise-Adaptive Confidence Sets for Linear Bandits and Application to Bayesian Optimization PDF

Jun, Kwang-Sung, Kim Jungtaek, Kwang-Sung Jun, Jungtaek Kim (2024)

[12] Data Source Adaptive Online Learning under Heteroscedastic Noise PDF

ABH Anand, A Saha, TK Buening, H Luo (2025)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Variance-Aware Exploration with Elimination (VAEE) algorithm for large action sets

[11] Federated Linear Bandits with Finite Adversarial Actions PDF

Cannot Refute

[27] Variance-aware decision making with linear function approximation under heavy-tailed rewards PDF

Cannot Refute

[28] Multi-Metric Adaptive Experimental Design under Fixed Budget with Validation PDF

Cannot Refute

[29] Risk-Aware Linear Bandits: Theory and Applications in Smart Order Routing PDF

Cannot Refute

Contribution

Variance-aware G-optimal design based exploration for finite action sets

[11] Federated Linear Bandits with Finite Adversarial Actions PDF

Cannot Refute

[29] Risk-Aware Linear Bandits: Theory and Applications in Smart Order Routing PDF

Cannot Refute

[30] No-Regret Linear Bandits under Gap-Adjusted Misspecification PDF

Cannot Refute

[31] Declaration of Committee PDF

Cannot Refute

[32] Adaptive Data Collection for Policy Evaluation, Multi-task Learning and Llm Alignment PDF

Cannot Refute

[33] Risk-Aware Linear Bandits with Application in Smart Order Routing PDF

Cannot Refute

Contribution

Breaking the Total Variance Barrier: Sharp Sample Complexity for Linear Heteroscedastic Bandits with Fixed Action Set

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[2] Information directed sampling and bandits with heteroscedastic noise PDF

[8] Noise-Adaptive Confidence Sets for Linear Bandits and Application to Bayesian Optimization PDF

[12] Data Source Adaptive Online Learning under Heteroscedastic Noise PDF

Contribution Analysis

Variance-Aware Exploration with Elimination (VAEE) algorithm for large action sets

[11] Federated Linear Bandits with Finite Adversarial Actions PDF

[27] Variance-aware decision making with linear function approximation under heavy-tailed rewards PDF

[28] Multi-Metric Adaptive Experimental Design under Fixed Budget with Validation PDF

[29] Risk-Aware Linear Bandits: Theory and Applications in Smart Order Routing PDF

Variance-aware G-optimal design based exploration for finite action sets

[11] Federated Linear Bandits with Finite Adversarial Actions PDF

[29] Risk-Aware Linear Bandits: Theory and Applications in Smart Order Routing PDF

[30] No-Regret Linear Bandits under Gap-Adjusted Misspecification PDF

[31] Declaration of Committee PDF

[32] Adaptive Data Collection for Policy Evaluation, Multi-task Learning and Llm Alignment PDF

[33] Risk-Aware Linear Bandits with Application in Smart Order Routing PDF

Nearly matching lower bound demonstrating harmonic-mean dependence is unavoidable

Table of Contents