Alternative Function Approximation Parameterizations for Solving Games: An Analysis of ƒ-Regression Counterfactual Regret Minimization

Open AccessProceedings Article

Alternative Function Approximation Parameterizations for Solving Games: An Analysis of ƒ-Regression Counterfactual Regret Minimization

- 05 May 2020

pp 339-347

6

TL;DR: In this article, the authors derive approximation error-aware regret bounds for (¶hi, ƒ)-regret matching, which applies to a general class of link functions and regret objectives.

Abstract: Function approximation is a powerful approach for structuring large decision problems that has facilitated great achievements in the areas of reinforcement learning and game playing Regression counterfactual regret minimization (RCFR) is a simple algorithm for approximately solving imperfect information games with normalized rectified linear unit (ReLU) parameterized policies In contrast, the more conventional softmax parameterization is standard in the field of reinforcement learning and yields a regret bound with a better dependence on the number of actions We derive approximation error-aware regret bounds for (¶hi, ƒ)-regret matching, which applies to a general class of link functions and regret objectives These bounds recover a tighter bound for RCFR and provide a theoretical justification for RCFR implementations with alternative policy parameterizations (ƒ-RCFR), including softmax We provide exploitability bounds for ƒ-RCFR with the polynomial and exponential link functions in zero-sum imperfect information games and examine empirically how the link function interacts with the severity of the approximation We find that the previously studied ReLU parameterization performs better when the approximation error is small while the softmax parameterization can perform better when the approximation error is large

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Proceedings Article

Neural Replicator Dynamics: Multiagent Learning via Hedging Policy Gradients

Daniel Hennes, +10 more

- 05 May 2020

TL;DR: An elegant one-line change to policy gradient methods is derived that simply bypasses the gradient step through the softmax, yielding a new algorithm titled Neural Replicator Dynamics (NeuRD), which quickly adapts to nonstationarities and outperforms policy gradient significantly in both tabular and function approximation settings.

...read moreread less

32

•Posted Content

Neural Replicator Dynamics.

Shayegan Omidshafiei, +8 more

- 01 Jun 2019

- arXiv: Learning

TL;DR: An elegant one-line change to policy gradient methods is derived that simply bypasses the gradient step through the softmax, yielding a new algorithm titled Neural Replicator Dynamics (NeuRD), which quickly adapts to nonstationarities and outperforms policy gradient significantly in both tabular and function approximation settings.

...read moreread less

29

•Posted Content

Algorithms in Multi-Agent Systems: A Holistic Perspective from Reinforcement Learning and Game Theory

Yunlong Lu, +1 more

- 17 Jan 2020

- arXiv: Computer Science and Game Theory

TL;DR: This survey introduces basic concepts and algorithms in single agent RL and multi-agent systems; then, it summarizes the related algorithms from three aspects.

...read moreread less

14

•Posted Content

Efficient Deviation Types and Learning for Hindsight Rationality in Extensive-Form Games

Dustin Morrill, +5 more

- 13 Feb 2021

- arXiv: Computer Science and Game Theory

TL;DR: In this paper, the authors formalize behavioral deviations as a general class of deviations that respect the structure of extensive-form games, and introduce an extensive form regret minimization (EFR) algorithm that achieves hindsight rationality for any given set of behavioral deviations with computation that scales closely with the complexity of the set.

...read moreread less

9

•10.7939/R3-040J-9E84

Regret Minimization with Function Approximation in Extensive-Form Games

Ryan D'Orazio

- 01 Jan 2020

TL;DR: Theoretical results for CFR are extended when using function approximation, and worst-case guarantees with function approximation are complemented with experiments on several common benchmark games with sequential decision making and imperfect information.

...read moreread less

2