Recently, extensive studies on photonic reinforcement learning to accelerate
the process of calculation by exploiting the physical nature of light have been
conducted. Previous studies utilized quantum interference of photons to achieve
collective decision-making without choice conflicts when solving the
competitive multi-armed bandit problem, a fundamental example of reinforcement
learning. However, the bandit problem deals with a static environment where the
agent's action does not influence the reward probabilities. This study aims to
extend the conventional approach to a more general multi-agent reinforcement
learning targeting the grid world problem. Unlike the conventional approach,
the proposed scheme deals with a dynamic environment where the reward changes
because of agents' actions. A successful photonic reinforcement learning scheme
requires both a photonic system that contributes to the quality of learning and
a suitable algorithm. This study proposes a novel learning algorithm,
discontinuous bandit Q-learning, in view of a potential photonic
implementation. Here, state-action pairs in the environment are regarded as
slot machines in the context of the bandit problem and an updated amount of
Q-value is regarded as the reward of the bandit problem. We perform numerical
simulations to validate the effectiveness of the bandit algorithm. In addition,
we propose a multi-agent architecture in which agents are indirectly connected
through quantum interference of light and quantum principles ensure the
conflict-free property of state-action pair selections among agents. We
demonstrate that multi-agent reinforcement learning can be accelerated owing to
conflict avoidance among multiple agents.

本研究提出了一种基于量子干涉的光子强化学习算法，扩展了传统方法以解决动态环境下的多智能体强化学习问题，并演示了多智能体强化学习可以通过光子干涉加速，以此避免智能体之间的冲突。