I present arguments against the hypothesis put forward by Silver, Singh, Precup, and Sutton ( https://www.sciencedirect.com/science/article/pii/S0004370221000862 ) : reward maximization is not enough to explain many activities associated with natural and artificial intelligence including knowledge, learning, perception, social intelligence, evolution, language, generalisation and imitation. I show such reductio ad lucrum has its intellectual origins in the political economy of Homo economicus and substantially overlaps with the radical version of behaviourism. I show why the reinforcement learning paradigm, despite its demonstrable usefulness in some practical application, is an incomplete framework for intelligence -- natural and artificial. Complexities of intelligent behaviour are not simply second-order complications on top of reward maximisation. This fact has profound implications for the development of practically usable, smart, safe and robust artificially intelligent agents.

本文挑战了Silver等人提出的奖励最大化假设，并指出强化学习范式虽然在某些实际应用中有用，但是它不是智能的完整框架，因为智能行为的复杂性不仅仅是奖励最大化的二阶复杂性。该事实对于可实际使用的智能、安全和强大的人工智能代理的发展具有深远的影响。

奖励不足够：我们能否将 AI 从强化学习范式中解放出来？