We consider a team of reinforcement learning agents that concurrently operate
in a common environment, and we develop an approach to efficient coordinated
exploration that is suitable for problems of practical scale. Our approach
builds on seed sampling (Dimakopoulou and Van Roy, 2018) and randomized value
function learning (Osband et al., 2016). We demonstrate that, for simple
tabular contexts, the approach is competitive with previously proposed tabular
model learning methods (Dimakopoulou and Van Roy, 2018). With a
higher-dimensional problem and a neural network value function representation,
the approach learns quickly with far fewer agents than alternative exploration
schemes.

在一个公共环境下，考虑一组同时运行的强化学习智能体，我们提出了一种适用于实际规模问题的高效协同探索方法，该方法建立在种子抽样和随机值函数学习的基础上，并证明该方法在简单表格式上与先前提出的表格式学习方法相当竞争力，在高维度问题和神经网络值函数表示的情况下，该方法可以通过使用更少的代理学习更快地进行探索比替代方法。