Neural Style Transfer (NST) refers to a class of algorithms able to
manipulate an element, most often images, to adopt the appearance or style of
another one. Each element is defined as a combination of Content and Style: the
Content can be conceptually defined as the what and the Style as the how of
said element. In this context, we propose a custom NST framework for
transferring a set of styles to the motion of a robotic manipulator, e.g., the
same robotic task can be carried out in an angry, happy, calm, or sad way. An
autoencoder architecture extracts and defines the Content and the Style of the
target robot motions. A Twin Delayed Deep Deterministic Policy Gradient (TD3)
network generates the robot control policy using the loss defined by the
autoencoder. The proposed Neural Policy Style Transfer TD3 (NPST3) alters the
robot motion by introducing the trained style. Such an approach can be
implemented either offline, for carrying out autonomous robot motions in
dynamic environments, or online, for adapting at runtime the style of a
teleoperated robot. The considered styles can be learned online from human
demonstrations. We carried out an evaluation with human subjects enrolling 73
volunteers, asking them to recognize the style behind some representative
robotic motions. Results show a good recognition rate, proving that it is
possible to convey different styles to a robot using this approach.

提出了一种自定义的神经风格转移框架（NPST3），用于将一组风格转移到机器人操纵器的运动中，通过使用自动编码器定义目标机器人运动的内涵和风格，生成机器人控制策略，并通过引入训练过的风格来改变机器人的运动。在人类志愿者调查中，结果表明可以通过该方法将不同的风格传递给机器人。

基于双延迟 DDPG 的神经风格迁移用于机器人操纵器的共享控制

Neural Style Transfer with Twin-Delayed DDPG for Shared Control of  Robotic Manipulators

Deep neuroevolution and deep reinforcement learning (deep RL) algorithms are
two popular approaches to policy search. The former is widely applicable and
rather stable, but suffers from low sample efficiency. By contrast, the latter
is more sample efficient, but the most sample efficient variants are also
rather unstable and highly sensitive to hyper-parameter setting. So far, these
families of methods have mostly been compared as competing tools. However, an
emerging approach consists in combining them so as to get the best of both
worlds. Two previously existing combinations use either an ad hoc evolutionary
algorithm or a goal exploration process together with the Deep Deterministic
Policy Gradient (DDPG) algorithm, a sample efficient off-policy deep RL
algorithm. In this paper, we propose a different combination scheme using the
simple cross-entropy method (CEM) and Twin Delayed Deep Deterministic policy
gradient (td3), another off-policy deep RL algorithm which improves over ddpg.
We evaluate the resulting method, cem-rl, on a set of benchmarks classically
used in deep RL. We show that cem-rl benefits from several advantages over its
competitors and offers a satisfactory trade-off between performance and sample
efficiency.

本文提出了一种新的方法 CEM-RL，将深度神经进化算法和深度强化学习算法相结合，选取 Twin Delayed Deep Deterministic policy gradient 和交叉熵方法，并在深度 RL 的一组基准测试中进行评估，结果表明 CEM-RL 在性能和样本效率之间取得了令人满意的平衡。