The lottery ticket hypothesis questions the role of overparameterization in supervised deep learning. But how does the distributional shift inherent to the reinforcement learning problem affect the performance of winning lottery tickets? In this work, we show that feed-forward networks trained via supervised policy distillation and reinforcement learning can be pruned to the same level of sparsity. Furthermore, we establish the existence of winning tickets for both on- and off-policy methods in a visual navigation and classic control task. Using a set of carefully designed baseline conditions, we find that the majority of the lottery ticket effect in reinforcement learning can be attributed to the identified mask. The resulting masked observation space eliminates redundant information and yields minimal task-relevant representations. The mask identified by iterative magnitude pruning provides an interpretable inductive bias. Its costly generation can be amortized by training dense agents with low-dimensional input and thereby at lower computational cost.

本文研究了在强化学习任务中Lottery Ticket假设的性能表现，通过比较基于行为克隆的前馈网络和强化学习代理的任务完成效果，我们发现可以在不会影响性能的前提下对前者的稀疏度进行更高程度的剪枝。同时，我们发现Lottery Ticket现象中部分效果是由于神经网络减枝所产生的mask，而不是参数的初始化方式。我们最后提出了一种初始化方法，来更好的支持低维控制任务的初始学习。

关于深度强化学习中的彩票券及最小任务表示