Robot motor skills can be learned through deep reinforcement learning (DRL)
by neural networks as state-action mappings. While the selection of state
observations is crucial, there has been a lack of quantitative analysis to
date. Here, we present a systematic saliency analysis that quantitatively
evaluates the relative importance of different feedback states for motor skills
learned through DRL. Our approach can identify the most essential feedback
states for locomotion skills, including balance recovery, trotting, bounding,
pacing and galloping. By using only key states including joint positions,
gravity vector, base linear and angular velocities, we demonstrate that a
simulated quadruped robot can achieve robust performance in various test
scenarios across these distinct skills. The benchmarks using task performance
metrics show that locomotion skills learned with key states can achieve
comparable performance to those with all states, and the task performance or
learning success rate will drop significantly if key states are missing. This
work provides quantitative insights into the relationship between state
observations and specific types of motor skills, serving as a guideline for
robot motor learning. The proposed method is applicable to differentiable
state-action mapping, such as neural network based control policies, enabling
the learning of a wide range of motor skills with minimal sensing dependencies.

使用深度强化学习，通过神经网络作为状态 - 动作映射，通过量化分析系统显著性分析来确定机器人学习的关键状态以实现迈步技能学习，其中包括平衡恢复，慢跑，奔跑，步态和奔跑。

识别学习运动技能的重要感觉反馈

Identifying Important Sensory Feedback for Learning Locomotion Skills

We study the robustness of reinforcement learning (RL) with adversarially
perturbed state observations, which aligns with the setting of many adversarial
attacks to deep reinforcement learning (DRL) and is also important for rolling
out real-world RL agent under unpredictable sensing noise. With a fixed agent
policy, we demonstrate that an optimal adversary to perturb state observations
can be found, which is guaranteed to obtain the worst case agent reward. For
DRL settings, this leads to a novel empirical adversarial attack to RL agents
via a learned adversary that is much stronger than previous ones. To enhance
the robustness of an agent, we propose a framework of alternating training with
learned adversaries (ATLA), which trains an adversary online together with the
agent using policy gradient following the optimal adversarial attack framework.
Additionally, inspired by the analysis of state-adversarial Markov decision
process (SA-MDP), we show that past states and actions (history) can be useful
for learning a robust agent, and we empirically find a LSTM based policy can be
more robust under adversaries. Empirical evaluations on a few continuous
control environments show that ATLA achieves state-of-the-art performance under
strong adversaries. Our code is available at
this https URL

研究了强化学习在面对敌对攻击 (即使状态的扰动) 时的鲁棒性，并提出了一种基于 ATLA 框架的方法来增强 Agent 的鲁棒性，通过训练 online 的对抗学习可以达到最优敌对攻击框架与提前学习历史数据等手段，从而提高强化学习在实验中的表现。