People's decisions about how to allocate their limited computational
resources are essential to human intelligence. An important component of this
metacognitive ability is deciding whether to continue thinking about what to do
and move on to the next decision. Here, we show that people acquire this
ability through learning and reverse-engineer the underlying learning
mechanisms. Using a process-tracing paradigm that externalises human planning,
we find that people quickly adapt how much planning they perform to the cost
and benefit of planning. To discover the underlying metacognitive learning
mechanisms we augmented a set of reinforcement learning models with
metacognitive features and performed Bayesian model selection. Our results
suggest that the metacognitive ability to adjust the amount of planning might
be learned through a policy-gradient mechanism that is guided by metacognitive
pseudo-rewards that communicate the value of planning.

该研究探讨人类在计算资源有限的情况下如何决策分配的元认知能力和元学习机制，结果表明人们通过学习来获得这种能力，并且可能是通过一种策略梯度机制来学习调整规划的数量。