Journal Article
Bandit Learning with General Function Classes: Heteroscedastic Noise and Variance-dependent Regret Bounds
Heyang Zhao,Dongruo Zhou,Jiafan He,Qingsong Gu +3 more
- Vol. abs/2202.13603
10
TL;DR: Under this framework, an algorithm is designed that constructs the variance-aware confidence set based on empirical risk minimization and proves a variance-dependent regret bound for generalized linear bandits, and an algorithm based on follow-the-regularized-leader (FTRL) subroutine and online-to-confidence-set conversion which can achieve a tighter variance- dependent regret under certain conditions.
read more
Abstract: We consider learning a stochastic bandit model, where the reward function belongs to a general class of uniformly bounded functions, and the additive noise can be heteroscedastic. Our model captures contextual linear bandits and generalized linear bandits as special cases. While previous works (Kirschner and Krause, 2018; Zhou et al., 2021) based on weighted ridge regression can deal with linear bandits with heteroscedastic noise, they are not directly applicable to our general model due to the curse of nonlinearity. In order to tackle this problem, we propose a multi-level learning framework for the general bandit model. The core idea of our framework is to partition the observed data into different levels according to the variance of their respective reward and perform online learning at each level collaboratively. Under our framework, we first design an algorithm that constructs the variance-aware confidence set based on empirical risk minimization and prove a variance-dependent regret bound. For generalized linear bandits, we further propose an algorithm based on follow-the-regularized-leader (FTRL) subroutine and online-to-confidence-set conversion, which can achieve a tighter variance-dependent regret under certain conditions.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement Learning: Adaptivity and Computational Efficiency
TL;DR: Recently, Zhou et al. as discussed by the authors proposed a variance-adaptive algorithm for linear MDPs with heteroscedastic noise, which achieves a problem-dependent horizon-free regret bound that can gracefully reduce to a nearly constant regret.
Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both Worlds in Stochastic and Deterministic Environments
TL;DR: In this paper , the authors study variance-dependent regret bounds for Markov decision processes (MDPs) and propose two new environment norms to characterize the fine-grained variance properties of the environment.
7
Variance-Aware Sparse Linear Bandits
Yan Dai,Ruosong Wang,Simon S. Du +2 more
- 26 May 2022
TL;DR: This paper presents the first variance-aware regret guarantee for sparse linear bandits, where σ 2 t is the variance of the noise at the t -th time step, and naturally interpolates the regret bounds for the worst-case constant-variance regime and the benign deterministic regimes.
Likelihood Ratio Confidence Sets for Sequential Decision Making
TL;DR: This paper revisit the likelihood-based inference principle and proposes to use likelihood ratios to construct any-time valid confidence sequences without requiring specialized treatment in each application scenario, especially suitable for problems with well-specified likelihoods.
4
Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits
Qiwei Di,Tao Jin,Yue Wu,Heyang Zhao,Farzad Farnoud,Quanquan Gu +5 more
TL;DR: This paper proposes a new SupLinUCB-type algorithm that enjoys computational efficiency and a variance-aware regret bound and performs empirical experiments on synthetic data to confirm the advantage of the method over previous variance-agnostic algorithms.
References
A contextual-bandit approach to personalized news article recommendation
Lihong Li,Wei Chu,John Langford,Robert E. Schapire +3 more
- 26 Apr 2010
TL;DR: This work model personalized recommendation of news articles as a contextual bandit problem, a principled approach in which a learning algorithm sequentially selects articles to serve users based on contextual information about the users and articles, while simultaneously adapting its article-selection strategy based on user-click feedback to maximize total user clicks.
•Book
Introduction to Online Convex Optimization
Elad Hazan
- 10 Aug 2016
TL;DR: This monograph portrays optimization as a process, by applying an optimization method that learns as one goes along, learning from experience as more aspects of the problem are observed.
•Proceedings Article
Improved Algorithms for Linear Stochastic Bandits
Yasin Abbasi-Yadkori,Dávid Pál,Csaba Szepesvári +2 more
- 12 Dec 2011
TL;DR: A simple modification of Auer's UCB algorithm achieves with high probability constant regret and improves the regret bound by a logarithmic factor, though experiments show a vast improvement.
•Proceedings Article
Contextual bandits with linear Payoff functions
Wei Chu,Lihong Li,Lev Reyzin,Robert E. Schapire +3 more
- 01 Dec 2011
TL;DR: An O (√ Td ln (KT ln(T )/δ) ) regret bound is proved that holds with probability 1− δ for the simplest known upper confidence bound algorithm for this problem.
Exponentiated gradient versus gradient descent for linear predictors
Jyrki Kivinen,Manfred K. Warmuth +1 more
TL;DR: The bounds suggest that the losses of the algorithms are in general incomparable, but EG(+/-) has a much smaller loss if only a few components of the input are relevant for the predictions, which is quite tight already on simple artificial data.
1K