How can artificial agents learn non-reinforced preferences to continuously
adapt their behaviour to a changing environment? We decompose this question
into two challenges: ($i$) encoding diverse memories and ($ii$) selectively
attending to these for preference formation. Our proposed
\emph{no}n-\emph{re}inforced preference learning mechanism using selective
attention, \textsc{Nore}, addresses both by leveraging the agent's world model
to collect a diverse set of experiences which are interleaved with imagined
roll-outs to encode memories. These memories are selectively attended to, using
attention and gating blocks, to update agent's preferences. We validate
\textsc{Nore} in a modified OpenAI Gym FrozenLake environment (without any
external signal) with and without volatility under a fixed model of the
environment -- and compare its behaviour to \textsc{Pepper}, a Hebbian
preference learning mechanism. We demonstrate that \textsc{Nore} provides a
straightforward framework to induce exploratory preferences in the absence of
external signals.

提出了一种人工智能代理学习无强化偏好的机制 	extsc {Nore}，通过利用代理的世界模型来收集不同的经验，然后通过选择性注意和门控机制更新代理的偏好，证明了其在无外部信号和波动性下可以诱导探索性偏好的有效性。