A Sample-Efficient Actor-Critic Algorithm for Recommendation Diversification
TL;DR: A novel actor-critic reinforcement learning algorithm for recommendation diversification that acts as the ranking policy, while the introduced critic predicts the expected future rewards of each candidate action.
read more
Abstract: Diversifying recommendation results gains benefits from satisfying user's existing interests as well as exploring novel information needs. Recently proposed Monte-Carlo based reinforcement learning method suffers from sample inefficiency, large variance, and even failing to perform well in large action space. We propose a novel actor-critic reinforcement learning algorithm for recommendation diversification in order to solve the above mentioned problems. The actor acts as the ranking policy, while the introduced critic predicts the expected future rewards of each candidate action. The critic target is updated by full Bellman equation and the actor network is optimized using expected gradient in the whole action space. To further stabilize and improve the performance, we also add policy-filtered critic supervision loss. Experiments on MovieLens dataset well demonstrate the effectiveness of our approach over multiple competitive methods.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Multi-feedback Pairwise Ranking via Adversarial Training for Recommender
TL;DR: A novel Multi-feedback pairwise ranking method via Adversarial training (AT-MPR) for recommender to enhance the robustness and overall performance in the event of rating pollution and outperforms state-of-the-art implicit feedback collaborative ranking models in two evaluation metrics.
4
Extractive text summarization model based on advantage actor-critic and graph matrix methodology.
TL;DR: Zhang et al. as mentioned in this paper introduced an extractive text summarization model based on a graph matrix and advantage actor-critic (GA2C) method, where the decision-making network made decisions and sent the results to the evaluation network for scoring.
3
Examining Policy Entropy of Reinforcement Learning Agents for Personalization Tasks
TL;DR: In this paper , the authors examine the behavior of reinforcement learning systems in personalization environments and detail the differences in policy entropy associated with the type of learning algorithm utilized, showing that policy optimization agents often possess low-entropy policies during training, which in practice results in agents prioritizing certain actions and avoiding others.
Research and Application of Rock Burst Hazard Assessment of the Working Face Based on the CF-TOPSIS Method
TL;DR: In this article , an improved comprehensive weighting prediction (CF-TOPSIS) method was proposed to predict weight and grade indices for rock burst evaluation in underground coal mines, and the prediction results combined with field drill cutting methods and microseismic monitoring data verify the accuracy of the proposed method.
References
•Book
Reinforcement Learning: An Introduction
Richard S. Sutton,Andrew G. Barto +1 more
- 01 Jan 1988
TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.
•Proceedings Article
Asynchronous methods for deep reinforcement learning
Volodymyr Mnih,Adrià Puigdomènech Badia,Mehdi Mirza,Alex Graves,Tim Harley,Timothy P. Lillicrap,David Silver,Koray Kavukcuoglu +7 more
- 19 Jun 2016
TL;DR: A conceptually simple and lightweight framework for deep reinforcement learning that uses asynchronous gradient descent for optimization of deep neural network controllers and shows that asynchronous actor-critic succeeds on a wide variety of continuous motor control problems as well as on a new task of navigating random 3D mazes using a visual input.
•Proceedings Article
Continuous control with deep reinforcement learning
Timothy P. Lillicrap,Jonathan J. Hunt,Alexander Pritzel,Nicolas Heess,Tom Erez,Yuval Tassa,David Silver,Daan Wierstra +7 more
- 22 Jul 2016
TL;DR: In this paper, an actor-critic, model-free algorithm based on the deterministic policy gradient is proposed to operate over continuous action spaces, which is able to find policies whose performance is competitive with those found by a planning algorithm with full access to the dynamics of the domain.
6.5K
The use of MMR, diversity-based reranking for reordering documents and producing summaries
Jaime Carbinell,Jade Goldstein +1 more
- 01 Aug 1998
TL;DR: A method for combining query-relevance with information-novelty in the context of text retrieval and summarization and preliminary results indicate some benefits for MMR diversity ranking in document retrieval and in single document summarization.
Novelty and diversity in information retrieval evaluation
Charles L. A. Clarke,Maheedhar Kolla,Gordon V. Cormack,Olga Vechtomova,Azin Ashkan,Stefan Büttcher,Ian MacKinnon +6 more
- 20 Jul 2008
TL;DR: This paper develops a framework for evaluation that systematically rewards novelty and diversity into a specific evaluation measure, based on cumulative gain, and demonstrates the feasibility of this approach using a test collection based on the TREC question answering track.