Modern deep learning (DL) workloads increasingly use complex deep reinforcement learning (DRL) algorithms that generate training data within the learning loop. This results in programs with several nested loops and dynamic data dependencies between tensors. While DL systems with eager execution support such dynamism, they lack the optimizations and smart scheduling of graph-based execution. Graph-based execution, however, cannot express dynamic tensor shapes, instead requiring the use of multiple static subgraphs. Either execution model for DRL thus leads to redundant computation, reduced parallelism, and less efficient memory management. We describe TimeRL, a system for executing dynamic DRL programs that combines the dynamism of eager execution with the whole-program optimizations and scheduling of graph-based execution. TimeRL achieves this by introducing the declarative programming model of recurrent tensors, which allows users to define dynamic dependencies as intuitive recurrence equations. TimeRL translates recurrent tensors into a polyhedral dependence graph (PDG) with dynamic dependencies as symbolic expressions. Through simple PDG transformations, TimeRL applies whole-program optimizations, such as automatic vectorization, incrementalization, and operator fusion. The PDG also allows for the computation of an efficient program-wide execution schedule, which decides on buffer deallocations, buffer donations, and GPU/CPU memory swapping. We show that TimeRL executes current DRL algorithms up to 47$\times$ faster than existing DRL systems, while using 16$\times$ less GPU peak memory.

本研究针对复杂深度强化学习算法中数据依赖性的问题，提出了一种新颖的TimeRL系统，结合了动态执行的灵活性与图形执行的全程序优化。TimeRL通过引入递归张量的声明式编程模型，实现了动态数据依赖的表达，从而在执行速度上比现有系统快达47倍，并且显著降低了GPU内存的使用。

TimeRL：具有多面体依赖图的高效深度强化学习