The ability to separate signal from noise, and reason with clean
abstractions, is critical to intelligence. With this ability, humans can
efficiently perform real world tasks without considering all possible nuisance
factors.How can artificial agents do the same? What kind of information can
agents safely discard as noises?
In this work, we categorize information out in the wild into four types based
on controllability and relation with reward, and formulate useful information
as that which is both controllable and reward-relevant. This framework
clarifies the kinds information removed by various prior work on representation
learning in reinforcement learning (RL), and leads to our proposed approach of
learning a Denoised MDP that explicitly factors out certain noise distractors.
Extensive experiments on variants of DeepMind Control Suite and RoboDesk
demonstrate superior performance of our denoised world model over using raw
observations alone, and over prior works, across policy optimization control
tasks as well as the non-control task of joint position regression.

该论文提出了一种基于奖励的学习框架，旨在通过分离信号与噪声、提取有用信息以及抑制某些噪声分心因素的方式来改进强化学习中的表示学习，实验结果表明其在控制任务和联合位置回归等任务中优于其他先前工作。

去噪 MDPs：学习比世界本身更好的世界模型

Denoised MDPs: Learning World Models Better Than the World Itself

Direct search for objects as part of navigation poses a challenge for small
items. Utilizing context in the form of object-object relationships enable
hierarchical search for targets efficiently. Most of the current approaches
tend to directly incorporate sensory input into a reward-based learning
approach, without learning about object relationships in the natural
environment, and thus generalize poorly across domains. We present
Memory-utilized Joint hierarchical Object Learning for Navigation in Indoor
Rooms (MJOLNIR), a target-driven navigation algorithm, which considers the
inherent relationship between target objects, and the more salient contextual
objects occurring in its surrounding. Extensive experiments conducted across
multiple environment settings show an $82.9\%$ and $93.5\%$ gain over existing
state-of-the-art navigation methods in terms of the success rate (SR), and
success weighted by path length (SPL), respectively. We also show that our
model learns to converge much faster than other algorithms, without suffering
from the well-known overfitting problem. Additional details regarding the
supplementary material and code are available at
this https URL

该研究提出了一种基于目标导向的导航算法 MJOLNIR，利用对象之间的关系和环境 context 来进行目标定位，相较于现有方法，在多种环境下实现了 82.9% 和 93.5% 的更高成功率和更短路径长度，并且收敛速度更快，并避免了过拟合问题。