Open AccessProceedings Article
Alternative Function Approximation Parameterizations for Solving Games: An Analysis of ƒ-Regression Counterfactual Regret Minimization
Ryan D'Orazio,Dustin Morrill,James R. Wright,Michael Bowling +3 more
- 05 May 2020
pp 339-347
6
TL;DR: In this article, the authors derive approximation error-aware regret bounds for (¶hi, ƒ)-regret matching, which applies to a general class of link functions and regret objectives.
read more
Abstract: Function approximation is a powerful approach for structuring large decision problems that has facilitated great achievements in the areas of reinforcement learning and game playing Regression counterfactual regret minimization (RCFR) is a simple algorithm for approximately solving imperfect information games with normalized rectified linear unit (ReLU) parameterized policies In contrast, the more conventional softmax parameterization is standard in the field of reinforcement learning and yields a regret bound with a better dependence on the number of actions We derive approximation error-aware regret bounds for (¶hi, ƒ)-regret matching, which applies to a general class of link functions and regret objectives These bounds recover a tighter bound for RCFR and provide a theoretical justification for RCFR implementations with alternative policy parameterizations (ƒ-RCFR), including softmax We provide exploitability bounds for ƒ-RCFR with the polynomial and exponential link functions in zero-sum imperfect information games and examine empirically how the link function interacts with the severity of the approximation We find that the previously studied ReLU parameterization performs better when the approximation error is small while the softmax parameterization can perform better when the approximation error is large
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
•Proceedings Article
Neural Replicator Dynamics: Multiagent Learning via Hedging Policy Gradients
Daniel Hennes,Dustin Morrill,Shayegan Omidshafiei,Rémi Munos,Julien Perolat,Marc Lanctot,Audrunas Gruslys,Jean-Baptiste Lespiau,Paavo Parmas,Edgar A. Duéñez-Guzmán,Karl Tuyls +10 more
- 05 May 2020
TL;DR: An elegant one-line change to policy gradient methods is derived that simply bypasses the gradient step through the softmax, yielding a new algorithm titled Neural Replicator Dynamics (NeuRD), which quickly adapts to nonstationarities and outperforms policy gradient significantly in both tabular and function approximation settings.
32
•Posted Content
Neural Replicator Dynamics.
Shayegan Omidshafiei,Daniel Hennes,Dustin Morrill,Rémi Munos,Julien Perolat,Marc Lanctot,Audrunas Gruslys,Jean-Baptiste Lespiau,Karl Tuyls +8 more
TL;DR: An elegant one-line change to policy gradient methods is derived that simply bypasses the gradient step through the softmax, yielding a new algorithm titled Neural Replicator Dynamics (NeuRD), which quickly adapts to nonstationarities and outperforms policy gradient significantly in both tabular and function approximation settings.
29
•Posted Content
Algorithms in Multi-Agent Systems: A Holistic Perspective from Reinforcement Learning and Game Theory
Yunlong Lu,Kai Yan +1 more
TL;DR: This survey introduces basic concepts and algorithms in single agent RL and multi-agent systems; then, it summarizes the related algorithms from three aspects.
•Posted Content
Efficient Deviation Types and Learning for Hindsight Rationality in Extensive-Form Games
TL;DR: In this paper, the authors formalize behavioral deviations as a general class of deviations that respect the structure of extensive-form games, and introduce an extensive form regret minimization (EFR) algorithm that achieves hindsight rationality for any given set of behavioral deviations with computation that scales closely with the complexity of the set.
9
Regret Minimization with Function Approximation in Extensive-Form Games
Ryan D'Orazio
- 01 Jan 2020
TL;DR: Theoretical results for CFR are extended when using function approximation, and worst-case guarantees with function approximation are complemented with experiments on several common benchmark games with sequential decision making and imperfect information.