Preprint10.48550/arxiv.2406.02515
Uncertainty of Joint Neural Contextual Bandit
Hao Guo,Zheqing Zhu +1 more
- 04 Jun 2024
TL;DR: Joint neural contextual bandit model suffers from high uncertainty due to the large number of items. This paper analyzes the uncertainty and provides theoretical and experimental findings to guide hyper-parameter tuning.
read more
Abstract: Contextual bandit learning is increasingly favored in modern large-scale recommendation systems. To better utlize the contextual information and available user or item features, the integration of neural networks have been introduced to enhance contextual bandit learning and has triggered significant interest from both academia and industry. However, a major challenge arises when implementing a disjoint neural contextual bandit solution in large-scale recommendation systems, where each item or user may correspond to a separate bandit arm. The huge number of items to recommend poses a significant hurdle for real world production deployment. This paper focuses on a joint neural contextual bandit solution which serves all recommending items in one single model. The output consists of a predicted reward $\mu$, an uncertainty $\sigma$ and a hyper-parameter $\alpha$ which balances exploitation and exploration, e.g., $\mu + \alpha \sigma$. The tuning of the parameter $\alpha$ is typically heuristic and complex in practice due to its stochastic nature. To address this challenge, we provide both theoretical analysis and experimental findings regarding the uncertainty $\sigma$ of the joint neural contextual bandit model. Our analysis reveals that $\alpha$ demonstrates an approximate square root relationship with the size of the last hidden layer $F$ and inverse square root relationship with the amount of training data $N$, i.e., $\sigma \propto \sqrt{\frac{F}{N}}$. The experiments, conducted with real industrial data, align with the theoretical analysis, help understanding model behaviors and assist the hyper-parameter tuning during both offline training and online deployment.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Figures
References
•Posted Content
PyTorch: An Imperative Style, High-Performance Deep Learning Library
Adam Paszke,Sam Gross,Francisco Massa,Adam Lerer,James Bradbury,Gregory Chanan,Trevor Killeen,Zeming Lin,Natalia Gimelshein,Luca Antiga,Alban Desmaison,Andreas Kopf,Edward Z. Yang,Zachary DeVito,Martin Raison,Alykhan Tejani,Sasank Chilamkurthy,Benoit Steiner,Lu Fang,Junjie Bai,Soumith Chintala +20 more
TL;DR: PyTorch as discussed by the authors is a machine learning library that provides an imperative and Pythonic programming style that makes debugging easy and is consistent with other popular scientific computing libraries, while remaining efficient and supporting hardware accelerators such as GPUs.
25.9K
A contextual-bandit approach to personalized news article recommendation
Lihong Li,Wei Chu,John Langford,Robert E. Schapire +3 more
- 26 Apr 2010
TL;DR: This work model personalized recommendation of news articles as a contextual bandit problem, a principled approach in which a learning algorithm sequentially selects articles to serve users based on contextual information about the users and articles, while simultaneously adapting its article-selection strategy based on user-click feedback to maximize total user clicks.
Multi-Armed Bandits in Recommendation Systems: A survey of the state-of-the-art and future directions
TL;DR: In this article , the authors performed a systematic literature review (SLR) to shed light on the new topic of Multi-Armed Bandit (MAB) in the recommendation field.
62
•Posted Content
Regret Analysis of Bandit Problems with Causal Background Knowledge
TL;DR: It is observed that even with a few hundreds of iterations, the regret of causal algorithms is less than that of standard algorithms by a factor of three, and under certain causal structures, these algorithms scale better than the standard bandit algorithms as the number of interventions increases.
34
•Posted Content
Recommendation System-based Upper Confidence Bound for Online Advertising.
Nhan Nguyen-Thanh,Dana Marinca,Kinda Khawam,David Rohde,Flavian Vasile,Elena Simona Lohan,Steven Martin,Dominique Quadri +7 more
TL;DR: Through extensive testing with RecoGym, an OpenAI Gym-based reinforcement learning environment for the product recommendation in online advertising, the proposed method outperforms the widespread reinforcement learning schemes such as $\epsilon$-Greedy, Upper Confidence (UCB1) and Exponential Weights for Exploration and Exploitation (EXP3).
15




