Computational accounts of purposeful behavior consist of descriptive and normative aspects. The former enable agents to ascertain the current (or future) state of affairs in the world and the latter to evaluate the desirability, or lack thereof, of these states with respect to the agent's goals. In Reinforcement Learning, the normative aspect (reward and value functions) is assumed to depend on a pre-defined and fixed descriptive one (state representation). Alternatively, these two aspects may emerge interdependently: goals can be, and indeed often are, expressed in terms of state representation features, but they may also serve to shape state representations themselves. Here, we illustrate a novel theoretical framing of state representation learning in bounded agents, coupling descriptive and normative aspects via the notion of goal-directed, or telic, states. We define a new controllability property of telic state representations to characterize the tradeoff between their granularity and the policy complexity capacity required to reach all telic states. We propose an algorithm for learning controllable state representations and demonstrate it using a simple navigation task with changing goals. Our framework highlights the crucial role of deliberate ignorance - knowing what to ignore - for learning state representations that are both goal-flexible and simple. More broadly, our work provides a concrete step towards a unified theoretical view of natural and artificial learning through the lens of goals.

计算行为的目的性描述和规范性方面包括现行（或未来）世界状况的确定以及对于实现代理人目标的这些状态的可取性的评估；本文提出了一种关于有限代理人中状态表示学习的新理论框架，通过目标导向或目标性状态的概念将描述性方面与规范性方面相结合；我们定义了目标状态表示的一种新的可控性属性来表征其粒度与实现所有目标状态所需的策略复杂性容量之间的权衡；我们提出了一种学习可控状态表示的算法，并通过简单的导航任务演示其有效性；我们的框架强调了有意识地忽视某些信息的重要性，从而学习既具有目标灵活性又简单的状态表示；总体而言，我们的工作为通过目标的视角提供了自然学习和人工学习的统一理论观点迈出了具体的一步。

学习瞬时可控的状态表示