This paper seeks to establish a framework for directing a society of simple, specialized, self-interested agents to solve what traditionally are posed as monolithic single-agent sequential decision problems. What makes it challenging to use a decentralized approach to collectively optimize a central objective is the difficulty in characterizing the equilibrium strategy profile of non-cooperative games. To overcome this challenge, we design a mechanism for defining the learning environment of each agent for which we know that the optimal solution for the global objective coincides with a Nash equilibrium strategy profile of the agents optimizing their own local objectives. The society functions as an economy of agents that learn the credit assignment process itself by buying and selling to each other the right to operate on the environment state. We derive a class of decentralized reinforcement learning algorithms that are broadly applicable not only to standard reinforcement learning but also for selecting options in semi-MDPs and dynamically composing computation graphs. Lastly, we demonstrate the potential advantages of a society's inherent modular structure for more efficient transfer learning.

该研究旨在建立一个框架，以引导一群简单、专业、自我利益代理人解决传统上作为整体单一代理人序列决策问题的难题，并通过设计一种学习环境机制，使每个代理人的最优解与 Nash 平衡策略一致，并为其推导出了一类分散式强化学习算法，同时展示了该社群内在结构对于更高效的迁移学习可能带来的潜在优势。

去中心化强化学习：通过本地经济交易进行全局决策制定