We present a modular approach to \emph{reinforcement learning} (RL) in
environments consisting of simpler components evolving in parallel. A
monolithic view of such modular environments may be prohibitively large to
learn, or may require unrealizable communication between the components in the
form of a centralized controller. Our proposed approach is based on the
assume-guarantee paradigm where the optimal control for the individual
components is synthesized in isolation by making \emph{assumptions} about the
behaviors of neighboring components, and providing \emph{guarantees} about
their own behavior. We express these \emph{assume-guarantee contracts} as
regular languages and provide automatic translations to scalar rewards to be
used in RL. By combining local probabilities of satisfaction for each
component, we provide a lower bound on the probability of satisfaction of the
complete system. By solving a Markov game for each component, RL can produce a
controller for each component that maximizes this lower bound. The controller
utilizes the information it receives through communication, observations, and
any knowledge of a coarse model of other agents. We experimentally demonstrate
the efficiency of the proposed approach on a variety of case studies.

我们提出了一种模块化的强化学习方法，其中环境由并行演化的简单组件组成，通过对邻近组件行为进行假设并提供自身行为保证来独立合成每个组件的最优控制器。我们通过将假设 - 保证合同表达为正则语言并自动将其转换为 RL 中使用的标量奖励，结合每个组件的满足概率，提供了对完整系统满足概率的下界，从而通过解决每个组件的 Markov 博弈产生了最大化该下界的控制器。