Bandit Learning with General Function Classes: Heteroscedastic Noise and Variance-dependent Regret Bounds

Journal Article

Bandit Learning with General Function Classes: Heteroscedastic Noise and Variance-dependent Regret Bounds

- Vol. abs/2202.13603

10

TL;DR: Under this framework, an algorithm is designed that constructs the variance-aware confidence set based on empirical risk minimization and proves a variance-dependent regret bound for generalized linear bandits, and an algorithm based on follow-the-regularized-leader (FTRL) subroutine and online-to-confidence-set conversion which can achieve a tighter variance- dependent regret under certain conditions.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.48550/arXiv.2302.10371

Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement Learning: Adaptivity and Computational Efficiency

Heyang Zhao, +4 more

- 21 Feb 2023

- arXiv.org

TL;DR: Recently, Zhou et al. as discussed by the authors proposed a variance-adaptive algorithm for linear MDPs with heteroscedastic noise, which achieves a problem-dependent horizon-free regret bound that can gracefully reduce to a nearly constant regret.

...read moreread less

14

Journal Article•10.48550/arXiv.2301.13446

Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both Worlds in Stochastic and Deterministic Environments

Runlong Zhou, +2 more

- 31 Jan 2023

- arXiv.org

TL;DR: In this paper , the authors study variance-dependent regret bounds for Markov decision processes (MDPs) and propose two new environment norms to characterize the fine-grained variance properties of the environment.

...read moreread less

7

Proceedings Article•10.48550/arXiv.2205.13450

Variance-Aware Sparse Linear Bandits

Yan Dai, +2 more

- 26 May 2022

TL;DR: This paper presents the ﬁrst variance-aware regret guarantee for sparse linear bandits, where σ 2 t is the variance of the noise at the t -th time step, and naturally interpolates the regret bounds for the worst-case constant-variance regime and the benign deterministic regimes.

...read moreread less

6

Journal Article•10.48550/arxiv.2311.04402

Likelihood Ratio Confidence Sets for Sequential Decision Making

N. Emmenegger, +2 more

- 08 Nov 2023

- arXiv.org

TL;DR: This paper revisit the likelihood-based inference principle and proposes to use likelihood ratios to construct any-time valid confidence sequences without requiring specialized treatment in each application scenario, especially suitable for problems with well-specified likelihoods.

...read moreread less

4

Journal Article•10.48550/arxiv.2310.00968

Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits

Qiwei Di, +5 more

- 02 Oct 2023

- arXiv.org

TL;DR: This paper proposes a new SupLinUCB-type algorithm that enjoys computational efficiency and a variance-aware regret bound and performs empirical experiments on synthetic data to confirm the advantage of the method over previous variance-agnostic algorithms.

...read moreread less

4

References

•Proceedings Article•10.1145/1772690.1772758

A contextual-bandit approach to personalized news article recommendation

Lihong Li, +3 more

- 26 Apr 2010

TL;DR: This work model personalized recommendation of news articles as a contextual bandit problem, a principled approach in which a learning algorithm sequentially selects articles to serve users based on contextual information about the users and articles, while simultaneously adapting its article-selection strategy based on user-click feedback to maximize total user clicks.

...read moreread less

3.3K

•Book

Introduction to Online Convex Optimization

Elad Hazan

- 10 Aug 2016

TL;DR: This monograph portrays optimization as a process, by applying an optimization method that learns as one goes along, learning from experience as more aspects of the problem are observed.

...read moreread less

1.8K

•Proceedings Article

Improved Algorithms for Linear Stochastic Bandits

Yasin Abbasi-Yadkori, +2 more

- 12 Dec 2011

TL;DR: A simple modification of Auer's UCB algorithm achieves with high probability constant regret and improves the regret bound by a logarithmic factor, though experiments show a vast improvement.

...read moreread less

1.7K

•Proceedings Article

Contextual bandits with linear Payoff functions

Wei Chu, +3 more

- 01 Dec 2011

TL;DR: An O (√ Td ln (KT ln(T )/δ) ) regret bound is proved that holds with probability 1− δ for the simplest known upper confidence bound algorithm for this problem.

...read moreread less

1K

•Journal Article•10.1006/INCO.1996.2612

Exponentiated gradient versus gradient descent for linear predictors

Jyrki Kivinen, +1 more

- 10 Jan 1997

- Information & Computation

TL;DR: The bounds suggest that the losses of the algorithms are in general incomparable, but EG(+/-) has a much smaller loss if only a few components of the input are relevant for the predictions, which is quite tight already on simple artificial data.

...read moreread less

1K

...

Expand