Programmatically Interpretable Reinforcement Learning (PIRL) encodes policies
in human-readable computer programs. Novel algorithms were recently introduced
with the goal of handling the lack of gradient signal to guide the search in
the space of programmatic policies. Most of such PIRL algorithms first train a
neural policy that is used as an oracle to guide the search in the programmatic
space. In this paper, we show that such PIRL-specific algorithms are not
needed, depending on the language used to encode the programmatic policies.
This is because one can use actor-critic algorithms to directly obtain a
programmatic policy. We use a connection between ReLU neural networks and
oblique decision trees to translate the policy learned with actor-critic
algorithms into programmatic policies. This translation from ReLU networks
allows us to synthesize policies encoded in programs with if-then-else
structures, linear transformations of the input values, and PID operations.
Empirical results on several control problems show that this translation
approach is capable of learning short and effective policies. Moreover, the
translated policies are at least competitive and often far superior to the
policies PIRL algorithms synthesize.

在这篇论文中，我们展示了使用 actor-critic 算法将从 actor-critic 算法学习到的策略转化为以程序形式编码的策略的连接，以此避免了需要使用特定于 PIRL 的算法的问题。实证结果表明，这种转化方法能够学习出简短而有效的策略，并且这些转化后的策略至少具有与 PIRL 算法相竞争的水平，往往更优秀。

使用演员 - 评论算法和 ReLU 网络合成程序策略

Synthesizing Programmatic Policies with Actor-Critic Algorithms and ReLU  Networks

The goal of self-supervised learning from images is to construct image
representations that are semantically meaningful via pretext tasks that do not
require semantic annotations for a large training set of images. Many pretext
tasks lead to representations that are covariant with image transformations. We
argue that, instead, semantic representations ought to be invariant under such
transformations. Specifically, we develop Pretext-Invariant Representation
Learning (PIRL, pronounced as "pearl") that learns invariant representations
based on pretext tasks. We use PIRL with a commonly used pretext task that
involves solving jigsaw puzzles. We find that PIRL substantially improves the
semantic quality of the learned image representations. Our approach sets a new
state-of-the-art in self-supervised learning from images on several popular
benchmarks for self-supervised learning. Despite being unsupervised, PIRL
outperforms supervised pre-training in learning image representations for
object detection. Altogether, our results demonstrate the potential of
self-supervised learning of image representations with good invariance
properties.

本研究证明了在无监督环境下，通过使用基于 PIRL 的预处理任务可以显著改善图像语义维度的质量，且该方法可以用于提取具有良好不变性质的图像信息（如对象检测）中。