Learning Two-Player Mixture Markov Games: Kernel Function Approximation and Correlated Equilibrium

doi:10.48550/arXiv.2208.05363

Journal Article10.48550/arXiv.2208.05363

Learning Two-Player Mixture Markov Games: Kernel Function Approximation and Correlated Equilibrium

Chris Junchi Li, +3 more

- 10 Aug 2022

- arXiv.org

- Vol. abs/2208.05363

1

TL;DR: A novel online learning algorithm is proposed that is able to attain an O ( √ T ) regret with polynomial computational complexity, under very mild assumptions on the reward function and the underlying dynamic of the Markov Games.

Abstract: We consider learning Nash equilibria in two-player zero-sum Markov Games with nonlinear function approximation, where the action-value function is approximated by a function in a Reproducing Kernel Hilbert Space (RKHS). The key challenge is how to do exploration in the high-dimensional function space. We propose a novel online learning algorithm to ﬁnd a Nash equilibrium by minimizing the duality gap. At the core of our algorithms are upper and lower conﬁdence bounds that are derived based on the principle of optimism in the face of uncertainty. We prove that our algorithm is able to attain an O ( √ T ) regret with polynomial computational complexity, under very mild assumptions on the reward function and the underlying dynamic of the Markov Games. We also propose several extensions of our algorithm, including an algorithm with Bernstein-type bonus that can achieve a tighter regret bound, and another algorithm for model misspeciﬁcation that can be applied to neural function approximation.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.48550/arXiv.2305.09659

Double Pessimism is Provably Efficient for Distributionally Robust Offline Reinforcement Learning: Generic Algorithm and Robust Partial Coverage

Jose Blanchet, +3 more

- 16 May 2023

- arXiv.org

TL;DR: In this paper , the authors proposed a general learning principle called double pessimism for robust offline RL and showed that it is provably efficient in the context of general function approximations.

...read moreread less

References

Journal Article•10.1038/S41586-019-1724-Z

Grandmaster level in StarCraft II using multi-agent reinforcement learning.

Oriol Vinyals, +41 more

- 30 Oct 2019

- Nature

TL;DR: The agent, AlphaStar, is evaluated, which uses a multi-agent reinforcement learning algorithm and has reached Grandmaster level, ranking among the top 0.2% of human players for the real-time strategy game StarCraft II.

...read moreread less

4.3K

Proceedings Article•10.1145/800057.808695

A new polynomial-time algorithm for linear programming

Narendra Karmarkar

- 01 Dec 1984

TL;DR: The algorithm consists of repeated application of such projective transformations each followed by optimization over an inscribed sphere to create a sequence of points which converges to the optimal solution in polynomial-time.

...read moreread less

4.2K

•Book Chapter•10.1016/B978-1-55860-335-6.50027-1

Markov games as a framework for multi-agent reinforcement learning

Michael L. Littman

- 10 Jul 1994

TL;DR: A Q-learning-like algorithm for finding optimal policies and its application to a simple two-player game in which the optimal policy is probabilistic is demonstrated.

...read moreread less

3.2K

•Journal Article•10.1073/PNAS.39.10.1095

Stochastic Games

Lloyd S. Shapley

- 01 Oct 1953

- Proceedings of the National Academy of S...

TL;DR: In a stochastic game the play proceeds by steps from position to position, according to transition probabilities controlled jointly by the two players, and the expected total gain or loss is bounded by M, which depends on N 2 + N matrices.

...read moreread less

3.1K

...

Expand

Learning Two-Player Mixture Markov Games: Kernel Function Approximation and Correlated Equilibrium

Chat with Paper

AI Agents for this Paper

Citations

Double Pessimism is Provably Efficient for Distributionally Robust Offline Reinforcement Learning: Generic Algorithm and Robust Partial Coverage

References

Mastering the game of Go with deep neural networks and tree search

Grandmaster level in StarCraft II using multi-agent reinforcement learning.

A new polynomial-time algorithm for linear programming

Markov games as a framework for multi-agent reinforcement learning

Stochastic Games