Uncertainty of Joint Neural Contextual Bandit

doi:10.48550/arxiv.2406.02515

Preprint10.48550/arxiv.2406.02515

Uncertainty of Joint Neural Contextual Bandit

Hao Guo, +1 more

- 04 Jun 2024

TL;DR: Joint neural contextual bandit model suffers from high uncertainty due to the large number of items. This paper analyzes the uncertainty and provides theoretical and experimental findings to guide hyper-parameter tuning.

Abstract: Contextual bandit learning is increasingly favored in modern large-scale recommendation systems. To better utlize the contextual information and available user or item features, the integration of neural networks have been introduced to enhance contextual bandit learning and has triggered significant interest from both academia and industry. However, a major challenge arises when implementing a disjoint neural contextual bandit solution in large-scale recommendation systems, where each item or user may correspond to a separate bandit arm. The huge number of items to recommend poses a significant hurdle for real world production deployment. This paper focuses on a joint neural contextual bandit solution which serves all recommending items in one single model. The output consists of a predicted reward $\mu$, an uncertainty $\sigma$ and a hyper-parameter $\alpha$ which balances exploitation and exploration, e.g., $\mu + \alpha \sigma$. The tuning of the parameter $\alpha$ is typically heuristic and complex in practice due to its stochastic nature. To address this challenge, we provide both theoretical analysis and experimental findings regarding the uncertainty $\sigma$ of the joint neural contextual bandit model. Our analysis reveals that $\alpha$ demonstrates an approximate square root relationship with the size of the last hidden layer $F$ and inverse square root relationship with the amount of training data $N$, i.e., $\sigma \propto \sqrt{\frac{F}{N}}$. The experiments, conducted with real industrial data, align with the theoretical analysis, help understanding model behaviors and assist the hyper-parameter tuning during both offline training and online deployment.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Figures

Figure 1: Joint neural contextual bandit.

Figure 4: Uncertainty ‡ with the last hidden layer sizeF , ‡ Ã Ô F .

Figure 5: Uncertainty ‡ with training data amountN , ‡ Ã 1Ô N .

Figure 2: Model’s output prediction µ + –‡ with – = 0.01.

Figure 3: Model’s output prediction µ + –‡ with – = 100.

References

•Posted Content

PyTorch: An Imperative Style, High-Performance Deep Learning Library

Adam Paszke, +20 more

- 03 Dec 2019

- arXiv: Learning

TL;DR: PyTorch as discussed by the authors is a machine learning library that provides an imperative and Pythonic programming style that makes debugging easy and is consistent with other popular scientific computing libraries, while remaining efficient and supporting hardware accelerators such as GPUs.

...read moreread less

25.9K

•Proceedings Article•10.1145/1772690.1772758

A contextual-bandit approach to personalized news article recommendation

Lihong Li, +3 more

- 26 Apr 2010

TL;DR: This work model personalized recommendation of news articles as a contextual bandit problem, a principled approach in which a learning algorithm sequentially selects articles to serve users based on contextual information about the users and articles, while simultaneously adapting its article-selection strategy based on user-click feedback to maximize total user clicks.

...read moreread less

3.3K

Journal Article•10.1016/j.eswa.2022.116669

Multi-Armed Bandits in Recommendation Systems: A survey of the state-of-the-art and future directions

Nícollas Silva, +4 more

- 01 Feb 2022

- Expert systems with applications

TL;DR: In this article , the authors performed a systematic literature review (SLR) to shed light on the new topic of Multi-Armed Bandit (MAB) in the recommendation field.

...read moreread less

62

•Posted Content

Regret Analysis of Bandit Problems with Causal Background Knowledge

Yangyi Lu, +3 more

- 11 Oct 2019

- arXiv: Machine Learning

TL;DR: It is observed that even with a few hundreds of iterations, the regret of causal algorithms is less than that of standard algorithms by a factor of three, and under certain causal structures, these algorithms scale better than the standard bandit algorithms as the number of interventions increases.

...read moreread less

34

•Posted Content

Recommendation System-based Upper Confidence Bound for Online Advertising.

Nhan Nguyen-Thanh, +7 more

- 09 Sep 2019

- arXiv: Information Retrieval

TL;DR: Through extensive testing with RecoGym, an OpenAI Gym-based reinforcement learning environment for the product recommendation in online advertising, the proposed method outperforms the widespread reinforcement learning schemes such as $\epsilon$-Greedy, Upper Confidence (UCB1) and Exponential Weights for Exploration and Exploitation (EXP3).

...read moreread less

15