Conference item icon

Conference item

Counterfactual multi−agent policy gradients

Abstract:

Many real-world problems, such as network packet routing and the coordination of autonomous vehicles, are naturally modelled as cooperative multi-agent systems. There is a great need for new reinforcement learning methods that can ef- ficiently learn decentralised policies for such systems. To this end, we propose a new multi-agent actor-critic method called counterfactual multi-agent (COMA) policy gradients. COMA uses a centralised critic to estimate the Q-function and decentralised actors t...

Expand abstract
Publication status:
Published
Peer review status:
Peer reviewed

Actions


Access Document


Files:

Authors


More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Computer Science
Oxford college:
St Catherine's College
Role:
Author
OxfordGoogle DeepMind Graduate Scholarship More from this funder
More from this funder
Grant:
CDTinAutonomousIntelligentMachines
Systems
Publisher:
AAAI Press Publisher's website
Journal:
32nd AAAI Conference on Artificial Intelligence (AAAI'18) Journal website
Pages:
2974-2982
Host title:
32nd AAAI Conference on Artificial Intelligence (AAAI'18)
Publication date:
2018-04-29
Acceptance date:
2017-11-09
ISSN:
2159-5399
Source identifiers:
745007
Keywords:
Pubs id:
pubs:745007
UUID:
uuid:37e732fe-a876-4699-8ee3-d556bfd235b3
Local pid:
pubs:745007
Deposit date:
2017-11-11

Terms of use


Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP