Reinforcement Learning (RL) algorithms often suffer from low training
efficiency. A strategy to mitigate this issue is to incorporate a model-based
planning algorithm, such as Monte Carlo Tree Search (MCTS) or Value Iteration
(VI), into the environmental model. The major limitation of VI is the need to
iterate over a large tensor. These still lead to intensive computations. We
focus on improving the training efficiency of RL algorithms by improving the
efficiency of the value learning process. For the deterministic environments
with discrete state and action spaces, a non-branching sequence of transitions
moves the agent without deviating from intermediate states, which we call a
highway. On such non-branching highways, the value-updating process can be
merged as a one-step process instead of iterating the value step-by-step. Based
on this observation, we propose a novel graph structure, named highway graph,
to model the state transition. Our highway graph compresses the transition
model into a concise graph, where edges can represent multiple state
transitions to support value propagation across multiple time steps in each
iteration. We thus can obtain a more efficient value learning approach by
facilitating the VI algorithm on highway graphs. By integrating the highway
graph into RL (as a model-based off-policy RL method), the RL training can be
remarkably accelerated in the early stages (within 1 million frames).
Comparison against various baselines on four categories of environments reveals
that our method outperforms both representative and novel model-free and
model-based RL algorithms, demonstrating 10 to more than 150 times more
efficiency while maintaining an equal or superior expected return, as confirmed
by carefully conducted analyses. Moreover, a deep neural network-based agent is
trained using the highway graph, resulting in better generalization and lower
storage costs.

为了提高 RL 算法的训练效率，本研究基于高速公路图的观察，提出了一种新颖的图结构，用于模拟状态转换，将 RL 训练在早期阶段显著加速，并在性能上优于其他无模型和带模型的 RL 算法。同时，基于高速公路图训练的深度神经网络代理具有更好的泛化性能和更低的存储成本。

高速公路图在强化学习中的加速

Highway Graph to Accelerate Reinforcement Learning

An important application of interactive machine learning is extending or
amplifying the cognitive and physical capabilities of a human. To accomplish
this, machines need to learn about their human users' intentions and adapt to
their preferences. In most current research, a user has conveyed preferences to
a machine using explicit corrective or instructive feedback; explicit feedback
imposes a cognitive load on the user and is expensive in terms of human effort.
The primary objective of the current work is to demonstrate that a learning
agent can reduce the amount of explicit feedback required for adapting to the
user's preferences pertaining to a task by learning to perceive a value of its
behavior from the human user, particularly from the user's facial
expressions---we call this face valuing. We empirically evaluate face valuing
on a grip selection task. Our preliminary results suggest that an agent can
quickly adapt to a user's changing preferences with minimal explicit feedback
by learning a value function that maps facial features extracted from a camera
image to expected future reward. We believe that an agent learning to perceive
a value from the body language of its human user is complementary to existing
interactive machine learning approaches and will help in creating successful
human-machine interactive applications.

本文研究了如何通过面部表情感知学习来降低人机交互中的显式反馈，取得了较好的实验效果，该方法可为人机交互领域提供辅助