The ability of large language models (LLMs) to engage in credible dialogues
with humans, taking into account the training data and the context of the
conversation, has raised discussions about their ability to exhibit intrinsic
motivations, agency, or even some degree of consciousness. We argue that the
internal architecture of LLMs and their finite and volatile state cannot
support any of these properties. By combining insights from complementary
learning systems, global neuronal workspace, and attention schema theories, we
propose to integrate LLMs and other deep learning systems into an architecture
for cognitive language agents able to exhibit properties akin to agency,
self-motivation, even some features of meta-cognition.

通过将大型语言模型与深度学习系统整合，提出能够展示类似于代理、自我激励甚至一些元认知特征的认知语言代理体系结构。

DeepThought：自主自律系统的架构

DeepThought: An Architecture for Autonomous Self-motivated Systems

In the realm of multi-agent reinforcement learning, intrinsic motivations
have emerged as a pivotal tool for exploration. While the computation of many
intrinsic rewards relies on estimating variational posteriors using neural
network approximators, a notable challenge has surfaced due to the limited
expressive capability of these neural statistics approximators. We pinpoint
this challenge as the "revisitation" issue, where agents recurrently explore
confined areas of the task space. To combat this, we propose a dynamic reward
scaling approach. This method is crafted to stabilize the significant
fluctuations in intrinsic rewards in previously explored areas and promote
broader exploration, effectively curbing the revisitation phenomenon. Our
experimental findings underscore the efficacy of our approach, showcasing
enhanced performance in demanding environments like Google Research Football
and StarCraft II micromanagement tasks, especially in sparse reward settings.

在多智能体强化学习领域，内在动机作为一种重要的探索工具已经出现。我们提出了一种动态奖励缩放方法，以应对神经网络统计近似器的有限表达能力所带来的挑战，并有效控制多次重复访问任务空间的现象，在 Google Research Football 和 StarCraft II 微管理任务等挑战性环境中展示了改进的性能，尤其是在稀疏奖励设置下。

多智能体强化学习中避免重复探索

Never Explore Repeatedly in Multi-Agent Reinforcement Learning

Autonomous open-ended learning is a relevant approach in machine learning and
robotics, allowing the design of artificial agents able to acquire goals and
motor skills without the necessity of user assigned tasks. A crucial issue for
this approach is to develop strategies to ensure that agents can maximise their
competence on as many tasks as possible in the shortest possible time.
Intrinsic motivations have proven to generate a task-agnostic signal to
properly allocate the training time amongst goals. While the majority of works
in the field of intrinsically motivated open-ended learning focus on scenarios
where goals are independent from each other, only few of them studied the
autonomous acquisition of interdependent tasks, and even fewer tackled
scenarios where goals involve non-stationary interdependencies. Building on
previous works, we tackle these crucial issues at the level of decision making
(i.e., building strategies to properly select between goals), and we propose a
hierarchical architecture that treating sub-tasks selection as a Markov
Decision Process is able to properly learn interdependent skills on the basis
of intrinsically generated motivations. In particular, we first deepen the
analysis of a previous system, showing the importance of incorporating
information about the relationships between tasks at a higher level of the
architecture (that of goal selection). Then we introduce H-GRAIL, a new system
that extends the previous one by adding a new learning layer to store the
autonomously acquired sequences of tasks to be able to modify them in case the
interdependencies are non-stationary. All systems are tested in a real robotic
scenario, with a Baxter robot performing multiple interdependent reaching
tasks.

该论文提出了一种基于马尔可夫决策过程的分层架构，使用内在动机最大化机器人学习多个具有关联性目标的能力，并提出了一种新的系统 H-GRAIL 来记录自主获取的任务序列，以能够在非稳态情况下修改它们。