Open AccessPosted Content
Alternative Function Approximation Parameterizations for Solving Games: An Analysis of $f$-Regression Counterfactual Regret Minimization
TL;DR: This work derives approximation error-aware regret bounds for $(\Phi, f)$-regret matching, which applies to a general class of link functions and regret objectives and provides a theoretical justification for RCFR implementations with alternative policy parameterizations, including softmax.
read more
Abstract: Function approximation is a powerful approach for structuring large decision problems that has facilitated great achievements in the areas of reinforcement learning and game playing. Regression counterfactual regret minimization (RCFR) is a simple algorithm for approximately solving imperfect information games with normalized rectified linear unit (ReLU) parameterized policies. In contrast, the more conventional softmax parameterization is standard in the field of reinforcement learning and yields a regret bound with a better dependence on the number of actions. We derive approximation error-aware regret bounds for $(\Phi, f)$-regret matching, which applies to a general class of link functions and regret objectives. These bounds recover a tighter bound for RCFR and provide a theoretical justification for RCFR implementations with alternative policy parameterizations ($f$-RCFR), including softmax. We provide exploitability bounds for $f$-RCFR with the polynomial and exponential link functions in zero-sum imperfect information games and examine empirically how the link function interacts with the severity of the approximation. We find that the previously studied ReLU parameterization performs better when the approximation error is small while the softmax parameterization can perform better when the approximation error is large.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
•Proceedings Article
Neural Replicator Dynamics: Multiagent Learning via Hedging Policy Gradients
Daniel Hennes,Dustin Morrill,Shayegan Omidshafiei,Rémi Munos,Julien Perolat,Marc Lanctot,Audrunas Gruslys,Jean-Baptiste Lespiau,Paavo Parmas,Edgar A. Duéñez-Guzmán,Karl Tuyls +10 more
- 05 May 2020
TL;DR: An elegant one-line change to policy gradient methods is derived that simply bypasses the gradient step through the softmax, yielding a new algorithm titled Neural Replicator Dynamics (NeuRD), which quickly adapts to nonstationarities and outperforms policy gradient significantly in both tabular and function approximation settings.
32
•Posted Content
Neural Replicator Dynamics.
Shayegan Omidshafiei,Daniel Hennes,Dustin Morrill,Rémi Munos,Julien Perolat,Marc Lanctot,Audrunas Gruslys,Jean-Baptiste Lespiau,Karl Tuyls +8 more
TL;DR: An elegant one-line change to policy gradient methods is derived that simply bypasses the gradient step through the softmax, yielding a new algorithm titled Neural Replicator Dynamics (NeuRD), which quickly adapts to nonstationarities and outperforms policy gradient significantly in both tabular and function approximation settings.
29
•Posted Content
Algorithms in Multi-Agent Systems: A Holistic Perspective from Reinforcement Learning and Game Theory
Yunlong Lu,Kai Yan +1 more
TL;DR: This survey introduces basic concepts and algorithms in single agent RL and multi-agent systems; then, it summarizes the related algorithms from three aspects.
Regret Minimization with Function Approximation in Extensive-Form Games
Ryan D'Orazio
- 01 Jan 2020
TL;DR: Theoretical results for CFR are extended when using function approximation, and worst-case guarantees with function approximation are complemented with experiments on several common benchmark games with sequential decision making and imperfect information.
Solving Poker Games Efficiently: Adaptive Memory based Deep Counterfactual Regret Minimization
Shuqing Shi,Xiaobin Wang,Dong Hao,Zhi-Xuan Yang,Hong Qu +4 more
- 18 Jul 2022
TL;DR: The adaptive memory sampling method is proposed which aims to find the distribution of the sampling length by using posterior sampling to update it iteratively and performs better than the state-of-the-art algorithms.
1
References
Mastering the game of Go with deep neural networks and tree search
David Silver,Aja Huang,Chris J. Maddison,Arthur Guez,Laurent Sifre,George van den Driessche,Julian Schrittwieser,Ioannis Antonoglou,Veda Panneershelvam,Marc Lanctot,Sander Dieleman,Dominik Grewe,John Nham,Nal Kalchbrenner,Ilya Sutskever,Timothy P. Lillicrap,Madeleine Leach,Koray Kavukcuoglu,Thore Graepel,Demis Hassabis +19 more
TL;DR: Using this search algorithm, the program AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0.5, the first time that a computer program has defeated a human professional player in the full-sized game of Go.
A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting
Yoav Freund,Robert E. Schapire +1 more
- 01 Aug 1997
TL;DR: The model studied can be interpreted as a broad, abstract extension of the well-studied on-line prediction model to a general decision-theoretic setting, and it is shown that the multiplicative weight-update Littlestone?Warmuth rule can be adapted to this model, yielding bounds that are slightly weaker in some cases, but applicable to a considerably more general class of learning problems.
Grandmaster level in StarCraft II using multi-agent reinforcement learning.
Oriol Vinyals,Igor Babuschkin,Wojciech Marian Czarnecki,Michael Mathieu,Andrew Dudzik,Junyoung Chung,David H. Choi,Richard E. Powell,Timo Ewalds,Petko Georgiev,Junhyuk Oh,Dan Horgan,Manuel Kroiss,Ivo Danihelka,Aja Huang,Laurent Sifre,Trevor Cai,John P. Agapiou,Max Jaderberg,Alexander Vezhnevets,Rémi Leblond,Tobias Pohlen,Valentin Dalibard,David Budden,Yury Sulsky,James Molloy,Tom Le Paine,Caglar Gulcehre,Ziyu Wang,Tobias Pfaff,Yuhuai Wu,Roman Ring,Dani Yogatama,Dario Wünsch,Katrina McKinney,Oliver Smith,Tom Schaul,Timothy P. Lillicrap,Koray Kavukcuoglu,Demis Hassabis,Chris Apps,David Silver +41 more
TL;DR: The agent, AlphaStar, is evaluated, which uses a multi-agent reinforcement learning algorithm and has reached Grandmaster level, ranking among the top 0.2% of human players for the real-time strategy game StarCraft II.
4.3K
A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play.
David Silver,Thomas Hubert,Julian Schrittwieser,Ioannis Antonoglou,Matthew Lai,Arthur Guez,Marc Lanctot,Laurent Sifre,Dharshan Kumaran,Thore Graepel,Timothy P. Lillicrap,Karen Simonyan,Demis Hassabis +12 more
TL;DR: This paper generalizes the AlphaZero approach into a single AlphaZero algorithm that can achieve superhuman performance in many challenging games, and convincingly defeated a world champion program in the games of chess and shogi (Japanese chess), as well as Go.
4.1K