Na Li - Learning decentralized policies in Multiagent Systems: How to learn efficiently and what are the learned policies?


Multiagent reinforcement learning has received a growing interest with various problem settings and applications. We will first present our recent work in learning decentralized policies in networked multiagent systems under a cooperative setting. Specifically, we propose a policy gradient based method that exploits the network structure and finds a local, decentralized policy that is an O(kr)-approximation of a first-order stationary point of the global objective for some r in (0,1), with complexity that scales with the local state-action space size of the largest κ-hop neighborhood of the network. Motivated by question of characterizing the performance of the stationary points, we look into the case where states could be shared among agents but agents still needs to take actions following decentralized policies. We show that even when agents have identical interests, the first-order stationary points are only corresponding to Nash equilibria. This observation naturally leads to the use of stochastic game framework to characterize the performance of policy gradients for decentralized policies in multiagent MDP systems. We will show that for general stochastic games, the equivalence between stationary points and Nash equilibria still holds. However, additional properties of the stochastic games would be needed for the global convergence of policy gradient. Joint work with Guannan Qu, Adam Wierman, Runyu Zhang, Zhaolin Ren

SEC 1.413.