Dongruo Zhou
13 Papers
3 Citations
Dongruo Zhou is an academic researcher. The author has contributed to research in topics: Computer science. The author has an hindex of 5, co-authored 10 publications.
Chat about Author
Papers
Nearly Optimal Algorithms for Linear Contextual Bandits with Adversarial Corruptions
Jiafan He,Dongruo Zhou,Tong Zhang,Qingsong Gu +3 more
- 13 May 2022
TL;DR: This paper proposes a new algorithm based on the principle of optimism in the face of uncertainty that achieves the near-optimal regret for both corrupted and uncorrupted cases simultaneously and shows that for both known C and unknown C cases, the algorithm with proper choice of hyperparameter achieves a regret that nearly matches the lower bounds.
31
Computationally Efficient Horizon-Free Reinforcement Learning for Linear Mixture MDPs
Dongruo Zhou,Qingsong Gu +1 more
- 23 May 2022
TL;DR: This paper proposes the first computationally efficient horizon-free algorithm for linear mixture MDPs, which achieves the optimal (cid:101) O ( d √ K + d 2 ) regret up to logarithmic factors.
Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement Learning: Adaptivity and Computational Efficiency
TL;DR: Recently, Zhou et al. as discussed by the authors proposed a variance-adaptive algorithm for linear MDPs with heteroscedastic noise, which achieves a problem-dependent horizon-free regret bound that can gracefully reduce to a nearly constant regret.
Proceedings Article
Learning Neural Contextual Bandits through Perturbed Rewards
Yiling Jia,Weitong Zhang,Dongruo Zhou,Qingsong Gu,Hongning Wang +4 more
- 24 Jan 2022
TL;DR: It is proved that a Õ(d̃ √ T ) regret upper bound is still achievable under standard regularity conditions, where T is the number of rounds of interactions and d̃ is the effective dimension of a neural tangent kernel matrix.
10
Journal Article
Bandit Learning with General Function Classes: Heteroscedastic Noise and Variance-dependent Regret Bounds
TL;DR: Under this framework, an algorithm is designed that constructs the variance-aware confidence set based on empirical risk minimization and proves a variance-dependent regret bound for generalized linear bandits, and an algorithm based on follow-the-regularized-leader (FTRL) subroutine and online-to-confidence-set conversion which can achieve a tighter variance- dependent regret under certain conditions.
10