TL;DR本文提出了Truncated Linear Temporal Logic (TLTL)以及与之相应的鲁棒性度量作为奖励函数的强化学习方法,用以解决机器人应用中复杂任务的学习问题。在仿真实验和Baxter机器人的任务中,表现出了优异的鲁棒性能。
Abstract
The reward function plays a critical role in reinforcement learning (RL). It is a place where designers specify the desired behavior and impose important constraints for the system. While most reward functions us