BriefGPT.xyz
May, 2025
人工智能代理的成功率是否存在半衰期?
Is there a half-life for the success rates of AI agents?
HTML
PDF
Toby Ord
TL;DR
本研究针对特定的研究工程任务,填补了AI代理在长时间任务中表现的理解空白,提出了一种简单的数学模型,该模型表明成功率随任务长度呈指数下降。最重要的发现是,失败的潜在原因在于长任务涉及越来越多的子任务,任何一个子任务的失败都将导致整体任务失败。
Abstract
Building on the recent empirical work of Kwa et al. (2025), I show that within their suite of research-engineering tasks the performance of
AI agents
on longer-duration tasks can be explained by an extremely simple
math
→