Online Prediction of Stochastic Sequences with High Probability Regret Bounds
Overview
Taxonomy
Research Landscape Overview
Claimed Contributions
The authors derive high-probability regret bounds for universal prediction of stochastic sequences that achieve a convergence rate of O(T^(-1/2)δ^(-1/2)) with probability at least 1-δ, complementing prior expectation bounds of order O(T^(-1/2)). These bounds hold under general bounded loss functions and do not require finite underlying spaces.
The authors prove that the dependence on the error probability δ in their high-probability bounds cannot be improved beyond the polynomial factors (1/δ or 1/√δ) without additional assumptions. This establishes fundamental limits on achievable high-probability regret bounds for this problem.
The authors extend the theoretical framework beyond finite spaces by using weaker technical assumptions than previous work, allowing their results to apply to more general measurable spaces while maintaining similar convergence guarantees.
Core Task Comparisons
Comparisons with papers in the same taxonomy category
Contribution Analysis
Detailed comparisons for each claimed contribution
High-probability regret bounds for universal prediction of stochastic sequences
The authors derive high-probability regret bounds for universal prediction of stochastic sequences that achieve a convergence rate of O(T^(-1/2)δ^(-1/2)) with probability at least 1-δ, complementing prior expectation bounds of order O(T^(-1/2)). These bounds hold under general bounded loss functions and do not require finite underlying spaces.
[56] Sequential probability assignment with contexts: Minimax regret, contextual shtarkov sums, and contextual normalized maximum likelihood PDF
[57] Sequential prediction of individual sequences under general loss functions PDF
[58] Universal artificial intelligence: Sequential decisions based on algorithmic probability PDF
[59] A universal probability assignment for prediction of individual sequences PDF
Impossibility result for improving the error probability dependence
The authors prove that the dependence on the error probability δ in their high-probability bounds cannot be improved beyond the polynomial factors (1/δ or 1/√δ) without additional assumptions. This establishes fundamental limits on achievable high-probability regret bounds for this problem.
[47] No-regret learning for fair multi-agent social welfare optimization PDF
[48] Proportional response: Contextual bandits for simple and cumulative regret minimization PDF
[49] Projection by convolution: Optimal sample complexity for reinforcement learning in continuous-space mdps PDF
[50] Bandits with many optimal arms PDF
[51] On Lower Bounds for Standard and Robust Gaussian Process Bandit Optimization PDF
[52] Logarithmic online regret bounds for undiscounted reinforcement learning PDF
[53] Misspecified linear bandits PDF
[54] Two studies in resource-efficient inference: structural testing of networks, and selective classification PDF
[55] Machine Learning Project Final Submission PDF
Relaxation of technical assumptions compared to prior work
The authors extend the theoretical framework beyond finite spaces by using weaker technical assumptions than previous work, allowing their results to apply to more general measurable spaces while maintaining similar convergence guarantees.