We propose a novel algorithmic framework for distributional reinforcement
learning, based on learning finite-dimensional mean embeddings of return
distributions. We derive several new algorithms for dynamic programming and
temporal-difference learning based on this framework, provide asymptotic
convergence theory, and examine the empirical performance of the algorithms on
a suite of tabular tasks. Further, we show that this approach can be
straightforwardly combined with deep reinforcement learning, and obtain a new
deep RL agent that improves over baseline distributional approaches on the
Arcade Learning Environment.

我们提出了一个新颖的算法框架来进行分布式强化学习，基于学习回报分布的有限维均值嵌入。我们基于此框架推导出了几个新的动态规划和时间差分学习算法，提供了渐近收敛理论，并对算法在一套表格任务上的实证性能进行了研究。此外，我们展示了这种方法可以与深度强化学习简单地结合，获得一个在 Arcade Learning Environment 上改进了基线分布式方法的新的深度强化学习代理。

基于均值嵌入的分布式贝尔曼算子

Distributional Bellman Operators over Mean Embeddings

Recently, deep reinforcement learning (RL) methods have been applied
successfully to multi-agent scenarios. Typically, these methods rely on a
concatenation of agent states to represent the information content required for
decentralized decision making. However, concatenation scales poorly to swarm
systems with a large number of homogeneous agents as it does not exploit the
fundamental properties inherent to these systems: (i) the agents in the swarm
are interchangeable and (ii) the exact number of agents in the swarm is
irrelevant. Therefore, we propose a new state representation for deep
multi-agent RL based on mean embeddings of distributions. We treat the agents
as samples of a distribution and use the empirical mean embedding as input for
a decentralized policy. We define different feature spaces of the mean
embedding using histograms, radial basis functions and a neural network learned
end-to-end. We evaluate the representation on two well known problems from the
swarm literature (rendezvous and pursuit evasion), in a globally and locally
observable setup. For the local setup we furthermore introduce simple
communication protocols. Of all approaches, the mean embedding representation
using neural network features enables the richest information exchange between
neighboring agents facilitating the development of more complex collective
strategies.

提出了一种基于分布的平均嵌入（mean embeddings）状态表示法，适用于具有大量同质代理的群集系统；在深度多智能体强化学习中利用神经网络方式实现的平均嵌入表示法可实现最丰富的邻近智能体信息交换，促进更复杂的集体策略的发展。