Alternative Function Approximation Parameterizations for Solving Games: An Analysis of $f$-Regression Counterfactual Regret Minimization

Open AccessPosted Content

Alternative Function Approximation Parameterizations for Solving Games: An Analysis of $f$-Regression Counterfactual Regret Minimization

- 06 Dec 2019

10

TL;DR: This work derives approximation error-aware regret bounds for $(\Phi, f)$-regret matching, which applies to a general class of link functions and regret objectives and provides a theoretical justification for RCFR implementations with alternative policy parameterizations, including softmax.

Abstract: Function approximation is a powerful approach for structuring large decision problems that has facilitated great achievements in the areas of reinforcement learning and game playing. Regression counterfactual regret minimization (RCFR) is a simple algorithm for approximately solving imperfect information games with normalized rectified linear unit (ReLU) parameterized policies. In contrast, the more conventional softmax parameterization is standard in the field of reinforcement learning and yields a regret bound with a better dependence on the number of actions. We derive approximation error-aware regret bounds for $(\Phi, f)$-regret matching, which applies to a general class of link functions and regret objectives. These bounds recover a tighter bound for RCFR and provide a theoretical justification for RCFR implementations with alternative policy parameterizations ($f$-RCFR), including softmax. We provide exploitability bounds for $f$-RCFR with the polynomial and exponential link functions in zero-sum imperfect information games and examine empirically how the link function interacts with the severity of the approximation. We find that the previously studied ReLU parameterization performs better when the approximation error is small while the softmax parameterization can perform better when the approximation error is large.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Proceedings Article

Neural Replicator Dynamics: Multiagent Learning via Hedging Policy Gradients

Daniel Hennes, +10 more

- 05 May 2020

TL;DR: An elegant one-line change to policy gradient methods is derived that simply bypasses the gradient step through the softmax, yielding a new algorithm titled Neural Replicator Dynamics (NeuRD), which quickly adapts to nonstationarities and outperforms policy gradient significantly in both tabular and function approximation settings.

...read moreread less

32

•Posted Content

Neural Replicator Dynamics.

Shayegan Omidshafiei, +8 more

- 01 Jun 2019

- arXiv: Learning

TL;DR: An elegant one-line change to policy gradient methods is derived that simply bypasses the gradient step through the softmax, yielding a new algorithm titled Neural Replicator Dynamics (NeuRD), which quickly adapts to nonstationarities and outperforms policy gradient significantly in both tabular and function approximation settings.

...read moreread less

29

•Posted Content

Algorithms in Multi-Agent Systems: A Holistic Perspective from Reinforcement Learning and Game Theory

Yunlong Lu, +1 more

- 17 Jan 2020

- arXiv: Computer Science and Game Theory

TL;DR: This survey introduces basic concepts and algorithms in single agent RL and multi-agent systems; then, it summarizes the related algorithms from three aspects.

...read moreread less

14

•10.7939/R3-040J-9E84

Regret Minimization with Function Approximation in Extensive-Form Games

Ryan D'Orazio

- 01 Jan 2020

TL;DR: Theoretical results for CFR are extended when using function approximation, and worst-case guarantees with function approximation are complemented with experiments on several common benchmark games with sequential decision making and imperfect information.

...read moreread less

2

Proceedings Article•10.1109/IJCNN55064.2022.9892417

Solving Poker Games Efficiently: Adaptive Memory based Deep Counterfactual Regret Minimization

Shuqing Shi, +4 more

- 18 Jul 2022

TL;DR: The adaptive memory sampling method is proposed which aims to find the distribution of the sampling length by using posterior sampling to update it iteratively and performs better than the state-of-the-art algorithms.

...read moreread less

1

References

•Journal Article•10.1006/JCSS.1997.1504

A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting

Yoav Freund, +1 more

- 01 Aug 1997

TL;DR: The model studied can be interpreted as a broad, abstract extension of the well-studied on-line prediction model to a general decision-theoretic setting, and it is shown that the multiplicative weight-update Littlestone?Warmuth rule can be adapted to this model, yielding bounds that are slightly weaker in some cases, but applicable to a considerably more general class of learning problems.

...read moreread less

18.6K

Learning from delayed rewards

Chris Watkins

- 01 Jan 1989

5.9K

Journal Article•10.1038/S41586-019-1724-Z

Grandmaster level in StarCraft II using multi-agent reinforcement learning.

Oriol Vinyals, +41 more

- 30 Oct 2019

- Nature

TL;DR: The agent, AlphaStar, is evaluated, which uses a multi-agent reinforcement learning algorithm and has reached Grandmaster level, ranking among the top 0.2% of human players for the real-time strategy game StarCraft II.

...read moreread less

4.3K

•Journal Article•10.1126/SCIENCE.AAR6404

A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play.

David Silver, +12 more

- 07 Dec 2018

- Science

TL;DR: This paper generalizes the AlphaZero approach into a single AlphaZero algorithm that can achieve superhuman performance in many challenging games, and convincingly defeated a world champion program in the games of chess and shogi (Japanese chess), as well as Go.

...read moreread less

4.1K

...

Expand

Alternative Function Approximation Parameterizations for Solving Games: An Analysis of $f$-Regression Counterfactual Regret Minimization

Chat with Paper

AI Agents for this Paper

Citations

Neural Replicator Dynamics: Multiagent Learning via Hedging Policy Gradients

Neural Replicator Dynamics.

Algorithms in Multi-Agent Systems: A Holistic Perspective from Reinforcement Learning and Game Theory

Regret Minimization with Function Approximation in Extensive-Form Games

Solving Poker Games Efficiently: Adaptive Memory based Deep Counterfactual Regret Minimization

References

Mastering the game of Go with deep neural networks and tree search

A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting

Learning from delayed rewards

Grandmaster level in StarCraft II using multi-agent reinforcement learning.

A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play.

Related Papers (5)

Bounds for Regret-Matching Algorithms.

Double Neural Counterfactual Regret Minimization

Online Model Selection for Reinforcement Learning with Function Approximation

Sparsity regret bounds for individual sequences in online linear regression

Contextual bandits with surrogate losses: Margin bounds and efficient algorithms