Reinforcement learning has been widely adopted to model dialogue managers in
task-oriented dialogues. However, the user simulator provided by
state-of-the-art dialogue frameworks are only rough approximations of human
behaviour. The ability to learn from a small number of human interactions is
hence crucial, especially on multi-domain and multi-task environments where the
action space is large. We therefore propose to use structured policies to
improve sample efficiency when learning on these kinds of environments. We also
evaluate the impact of learning from human vs simulated experts. Among the
different levels of structure that we tested, the graph neural networks (GNNs)
show a remarkable superiority by reaching a success rate above 80% with only 50
dialogues, when learning from simulated experts. They also show superiority
when learning from human experts, although a performance drop was observed,
indicating a possible difficulty in capturing the variability of human
strategies. We therefore suggest to concentrate future research efforts on
bridging the gap between human data, simulators and automatic evaluators in
dialogue frameworks.

本研究旨在探讨使用结构化政策提高在多领域和多任务环境下的强化学习样本效率。作者在测试不同结构化水平时，发现图形神经网络具有优势，且建议未来的研究应聚焦于连接人类数据、模拟器和自动评估器。

面向多领域和多任务对话的少样本结构化策略学习

Few-Shot Structured Policy Learning for Multi-Domain and Multi-Task Dialogues

Task-oriented dialogue systems are designed to achieve specific goals while
conversing with humans. In practice, they may have to handle simultaneously
several domains and tasks. The dialogue manager must therefore be able to take
into account domain changes and plan over different domains/tasks in order to
deal with multidomain dialogues. However, learning with reinforcement in such
context becomes difficult because the state-action dimension is larger while
the reward signal remains scarce. Our experimental results suggest that
structured policies based on graph neural networks combined with different
degrees of imitation learning can effectively handle multi-domain dialogues.
The reported experiments underline the benefit of structured policies over
standard policies.

本研究使用基于图神经网络的结构化策略及不同程度的模仿学习，来有效地处理多领域对话，结果表明结构化策略优于标准策略。

多领域任务导向对话的图神经网络策略及模仿学习

Graph Neural Network Policies and Imitation Learning for Multi-Domain Task-Oriented Dialogues

Dexterous manipulation is a challenging and important problem in robotics.
While data-driven methods are a promising approach, current benchmarks require
simulation or extensive engineering support due to the sample inefficiency of
popular methods. We present benchmarks for the TriFinger system, an open-source
robotic platform for dexterous manipulation and the focus of the 2020 Real
Robot Challenge. The benchmarked methods, which were successful in the
challenge, can be generally described as structured policies, as they combine
elements of classical robotics and modern policy optimization. This inclusion
of inductive biases facilitates sample efficiency, interpretability,
reliability and high performance. The key aspects of this benchmarking is
validation of the baselines across both simulation and the real system,
thorough ablation study over the core features of each solution, and a
retrospective analysis of the challenge as a manipulation benchmark. The code
and demo videos for this work can be found on our website
(this https URL).

本研究旨在解决机器人技术中的熟练操作难题，特别关注于 TriFinger 系统，提出了基于结构化策略的测试基准，包括了经典机器人学和现代策略优化元素，该基准通过仿真和实际系统验证了基线结果，并对核心特点进行了系统的分析。