Everything else being equal, simpler models should be preferred over more
complex ones. In reinforcement learning (RL), simplicity is typically
quantified on an action-by-action basis -- but this timescale ignores temporal
regularities, like repetitions, often present in sequential strategies. We
therefore propose an RL algorithm that learns to solve tasks with sequences of
actions that are compressible. We explore two possible sources of simple action
sequences: Sequences that can be learned by autoregressive models, and
sequences that are compressible with off-the-shelf data compression algorithms.
Distilling these preferences into sequence priors, we derive a novel
information-theoretic objective that incentivizes agents to learn policies that
maximize rewards while conforming to these priors. We show that the resulting
RL algorithm leads to faster learning, and attains higher returns than
state-of-the-art model-free approaches in a series of continuous control tasks
from the DeepMind Control Suite. These priors also produce a powerful
information-regularized agent that is robust to noisy observations and can
perform open-loop control.

使用信息熵的目标函数和可压缩动作序列作为先验，提出了一种新的强化学习算法，能够学习解决包含可压缩序列动作的任务。在一系列连续控制任务中表现比最先进的无模型方法更好，并且产生出强大的信息正则化代理，能够对噪声观测进行鲁棒控制和执行开环控制。

使用简单序列先验的强化学习

Reinforcement Learning with Simple Sequence Priors

Representations of data that are invariant to changes in specified factors
are useful for a wide range of problems: removing potential biases in
prediction problems, controlling the effects of covariates, and disentangling
meaningful factors of variation. Unfortunately, learning representations that
exhibit invariance to arbitrary nuisance factors yet remain useful for other
tasks is challenging. Existing approaches cast the trade-off between task
performance and invariance in an adversarial way, using an iterative minimax
optimization. We show that adversarial training is unnecessary and sometimes
counter-productive; we instead cast invariant representation learning as a
single information-theoretic objective that can be directly optimized. We
demonstrate that this approach matches or exceeds performance of
state-of-the-art adversarial approaches for learning fair representations and
for generative modeling with controllable transformations.

无需对抗训练，使用信息论优化能够直接获得可控转换的公平表示和生成建模的最新性能

无需对抗训练的不变表征

Invariant Representations without Adversarial Training

We address the problem of controlling a mobile robot to explore a partially
known environment. The robot's objective is the maximization of the amount of
information collected about the environment. We formulate the problem as a
partially observable Markov decision process (POMDP) with an
information-theoretic objective function, and solve it applying forward
simulation algorithms with an open-loop approximation. We present a new
sample-based approximation for mutual information useful in mobile robotics.
The approximation can be seamlessly integrated with forward simulation planning
algorithms. We investigate the usefulness of POMDP based planning for
exploration, and to alleviate some of its weaknesses propose a combination with
frontier based exploration. Experimental results in simulated and real
environments show that, depending on the environment, applying POMDP based
planning for exploration can improve performance over frontier exploration.

本研究针对部分已知环境探索问题，以信息论目标函数为目标，将其视为部分可观察马尔可夫决策过程 (POMDP)，并通过 open-loop 逼近算法求解。提出了新的互信息采样逼近方法用于移动机器人，结果显示 POMDP 探索算法在某些情况下可以提高性能。

基于前向仿真的机器人探索规划

Planning for robotic exploration based on forward simulation

We introduce a method to learn a hierarchy of successively more abstract
representations of complex data based on optimizing an information-theoretic
objective. Intuitively, the optimization searches for a set of latent factors
that best explain the correlations in the data as measured by multivariate
mutual information. The method is unsupervised, requires no model assumptions,
and scales linearly with the number of variables which makes it an attractive
approach for very high dimensional systems. We demonstrate that Correlation
Explanation (CorEx) automatically discovers meaningful structure for data from
diverse sources including personality tests, DNA, and human language.

提出了一种层次逐步抽象表示复杂数据的学习方法，该方法基于优化信息论目标，通过最大化多变量互信息来搜索最佳解释数据相关性的潜在因素集合，该方法无需监督，不需要模型假设，并且随着变量数量的线性扩展具有可行性，我们证明了 Correlation Explanation (CorEx) 自动发现了来源于多个数据源的有意义的结构，包括个性测试、DNA 和人类语言。