具有时态逻辑奖励的强化学习

Dec, 2016

Reinforcement Learning With Temporal Logic Rewards

Xiao Li, Cristian-Ioan Vasile, Calin Belta

TL;DR本文提出了Truncated Linear Temporal Logic (TLTL)以及与之相应的鲁棒性度量作为奖励函数的强化学习方法，用以解决机器人应用中复杂任务的学习问题。在仿真实验和Baxter机器人的任务中，表现出了优异的鲁棒性能。

Abstract

The reward function plays a critical role in reinforcement learning (RL). It is a place where designers specify the desired behavior and impose important constraints for the system. While most reward functions us