Reinforcement learning (RL) allows to solve complex tasks such as Go often
with a stronger performance than humans. However, the learned behaviors are
usually fixed to specific tasks and unable to adapt to different contexts. Here
we consider the case of adapting RL agents to different time restrictions, such
as finishing a task with a given time limit that might change from one task
execution to the next. We define such problems as Time Adaptive Markov Decision
Processes and introduce two model-free, value-based algorithms: the Independent
Gamma-Ensemble and the n-Step Ensemble. In difference to classical approaches,
they allow a zero-shot adaptation between different time restrictions. The
proposed approaches represent general mechanisms to handle time adaptive tasks
making them compatible with many existing RL methods, algorithms, and
scenarios.

本文介绍一种适用于时间限制任务的增强学习算法，称为适应时间的马尔可夫决策过程，这种算法可以灵活地适应不同的时间限制，并使用两种无模型，基于价值的算法，Gamma-Ensemble 和 n-Step Ensemble。该算法可与许多现有的 RL 方法，算法和场景兼容，并能够实现零 - shot 的适应能力。