The quality of images captured outdoors is often affected by the weather. One
factor that interferes with sight is rain, which can obstruct the view of
observers and computer vision applications that rely on those images. The work
aims to recover rain images by removing rain streaks via Self-supervised
Reinforcement Learning (RL) for image deraining (SRL-Derain). We locate rain
streak pixels from the input rain image via dictionary learning and use
pixel-wise RL agents to take multiple inpainting actions to remove rain
progressively. To our knowledge, this work is the first attempt where
self-supervised RL is applied to image deraining. Experimental results on
several benchmark image-deraining datasets show that the proposed SRL-Derain
performs favorably against state-of-the-art few-shot and self-supervised
deraining and denoising methods.

本研究使用自我监督强化学习（RL）方法进行图像去雨，利用字典学习定位雨线像素并使用像素级 RL 代理逐步去除雨水。实验结果表明，该方法在几个基准图像去雨数据集上表现优于最先进的少样本学习和自我监督去雨和去噪方法。

基于自监督强化学习的图像去雨

Image Deraining via Self-supervised Reinforcement Learning

Supervised regression to demonstrations has been demonstrated to be a stable
way to train deep policy networks. We are motivated to study how we can take
full advantage of supervised loss functions for stably training deep
reinforcement learning agents. This is a challenging task because it is unclear
how the training data could be collected to enable policy improvement. In this
work, we propose Self-Supervised Reinforcement Learning (SSRL), a simple
algorithm that optimizes policies with purely supervised losses. We demonstrate
that, without policy gradient or value estimation, an iterative procedure of
``labeling" data and supervised regression is sufficient to drive stable policy
improvement. By selecting and imitating trajectories with high episodic
rewards, SSRL is surprisingly competitive to contemporary algorithms with more
stable performance and less running time, showing the potential of solving
reinforcement learning with supervised learning techniques. The code is
available at this https URL

通过自监督回归学习策略网络，提出了一种基于监督损失函数训练深度强化学习智能体的算法 (SSRL)，该算法无需策略梯度或价值估计，能够通过监督回归数据来稳定提高策略表现并在效率和性能方面与现有算法相媲美，展示了利用监督学习技术解决强化学习问题的潜力。

自监督简化深度强化学习

Simplifying Deep Reinforcement Learning via Self-Supervision

Learning to reach goal states and learning diverse skills through mutual
information (MI) maximization have been proposed as principled frameworks for
self-supervised reinforcement learning, allowing agents to acquire broadly
applicable multitask policies with minimal reward engineering. Starting from a
simple observation that the standard goal-conditioned RL (GCRL) is encapsulated
by the optimization objective of variational empowerment, we discuss how GCRL
and MI-based RL can be generalized into a single family of methods, which we
name variational GCRL (VGCRL), interpreting variational MI maximization, or
variational empowerment, as representation learning methods that acquire
functionally-aware state representations for goal reaching. This novel
perspective allows us to: (1) derive simple but unexplored variants of GCRL to
study how adding small representation capacity can already expand its
capabilities; (2) investigate how discriminator function capacity and
smoothness determine the quality of discovered skills, or latent goals, through
modifying latent dimensionality and applying spectral normalization; (3) adapt
techniques such as hindsight experience replay (HER) from GCRL to MI-based RL;
and lastly, (4) propose a novel evaluation metric, named latent goal reaching
(LGR), for comparing empowerment algorithms with different choices of latent
dimensionality and discriminator parameterization. Through principled
mathematical derivations and careful experimental studies, our work lays a
novel foundation from which to evaluate, analyze, and develop representation
learning techniques in goal-based RL.

通过代表愿景达成的功能感知状态表示进行的变分互信息最大化，能够为达到愿景状态的广泛应用的多任务策略的自监督强化学习提供框架和方法，同时还提出了广义 GCRL 和 MI-Based RL 的统一方法，即 VGCRL，并结合方法的容量和光滑性分析了能力扩展，以及其与不同线性变换结构的潜在目标发现算法的比较评价指标，即 LGR。