In this paper we investigate the notion of legibility in sequential decision
tasks under uncertainty. Previous works that extend legibility to scenarios
beyond robot motion either focus on deterministic settings or are
computationally too expensive. Our proposed approach, dubbed PoL-MDP, is able
to handle uncertainty while remaining computationally tractable. We establish
the advantages of our approach against state-of-the-art approaches in several
simulated scenarios of different complexity. We also showcase the use of our
legible policies as demonstrations for an inverse reinforcement learning agent,
establishing their superiority against the commonly used demonstrations based
on the optimal policy. Finally, we assess the legibility of our computed
policies through a user study where people are asked to infer the goal of a
mobile robot following a legible policy by observing its actions.

研究了不确定条件下顺序决策任务中可读性的概念。提出了一种名为 PoL-MDP 的方法，能够处理不确定性，同时具有计算上的可处理性，在多种模拟场景中证明了其在状态决策方面的优势，同时也表明该方法可以被用于反向强化学习。通过用户研究评估了该计算策略的可读性。

猜猜我在干什么”：将易读性扩展到序列决策任务

"Guess what I'm doing": Extending legibility to sequential decision tasks

Recent years have seen significant advances in explainable AI as the need to
understand deep learning models has gained importance with the increased
emphasis on trust and ethics in AI. Comprehensible models for sequential
decision tasks are a particular challenge as they require understanding not
only individual predictions but a series of predictions that interact with
environmental dynamics. We present a framework for learning comprehensible
models of sequential decision tasks in which agent strategies are characterized
using temporal logic formulas. Given a set of agent traces, we first cluster
the traces using a novel embedding method that captures frequent action
patterns. We then search for logical formulas that explain the agent strategies
in the different clusters. We evaluate our framework on combat scenarios in
StarCraft II (SC2), using traces from a handcrafted expert policy and a trained
reinforcement learning agent. We implemented a feature extractor for SC2
environments that extracts traces as sequences of high-level features
describing both the state of the environment and the agent's local behavior
from agent replays. We further designed a visualization tool depicting the
movement of units in the environment that helps understand how different task
conditions lead to distinct agent behavior patterns in each trace cluster.
Experimental results show that our framework is capable of separating agent
traces into distinct groups of behaviors for which our approach to strategy
inference produces consistent, meaningful, and easily understood strategy
descriptions.

本研究提出了一个框架，用于学习顺序决策任务的可理解模型，通过时间逻辑公式表征代理策略，并使用一个嵌入方法对代理足迹进行聚类，得出在不同的聚类中解释代理策略的逻辑公式，通过编写一个特征提取器和一个可视化工具，对在 StarCraft II 中的战斗场景进行了评估，实验结果表明，本框架可以将代理足迹分为不同的行为组，并为每个行为组提供一致、有意义且易于理解的策略描述。

强化学习代理策略理解与可视化框架

A Framework for Understanding and Visualizing Strategies of RL Agents

Reinforcement learning (RL) is a popular paradigm for addressing sequential
decision tasks in which the agent has only limited environmental feedback.
Despite many advances over the past three decades, learning in many domains
still requires a large amount of interaction with the environment, which can be
prohibitively expensive in realistic scenarios. To address this problem,
transfer learning has been applied to reinforcement learning such that
experience gained in one task can be leveraged when starting to learn the next,
harder task. More recently, several lines of research have explored how tasks,
or data samples themselves, can be sequenced into a curriculum for the purpose
of learning a problem that may otherwise be too difficult to learn from
scratch. In this article, we present a framework for curriculum learning (CL)
in reinforcement learning, and use it to survey and classify existing CL
methods in terms of their assumptions, capabilities, and goals. Finally, we use
our framework to find open problems and suggest directions for future RL
curriculum learning research.

本文提出了强化学习中的课程学习框架，并使用此框架对现有的课程学习方法进行分类和研究，以找出未解决的问题并提出未来研究的方向。