Reinforcement learning (RL) excels in optimizing policies for discrete-time
Markov decision processes (MDP). However, various systems are inherently
continuous in time, making discrete-time MDPs an inexact modeling choice. In
many applications, such as greenhouse control or medical treatments, each
interaction (measurement or switching of action) involves manual intervention
and thus is inherently costly. Therefore, we generally prefer a time-adaptive
approach with fewer interactions with the system. In this work, we formalize an
RL framework, Time-adaptive Control & Sensing (TaCoS), that tackles this
challenge by optimizing over policies that besides control predict the duration
of its application. Our formulation results in an extended MDP that any
standard RL algorithm can solve. We demonstrate that state-of-the-art RL
algorithms trained on TaCoS drastically reduce the interaction amount over
their discrete-time counterpart while retaining the same or improved
performance, and exhibiting robustness over discretization frequency. Finally,
we propose OTaCoS, an efficient model-based algorithm for our setting. We show
that OTaCoS enjoys sublinear regret for systems with sufficiently smooth
dynamics and empirically results in further sample-efficiency gains.

优化系统学习离散时间决策进程的强化学习算法，以减少与系统的交互次数并提高性能，同时提出了时间自适应控制和感知（TaCoS）框架以及 OTaCoS 模型算法。