BriefGPT.xyz
Apr, 2025
文本竞技场
TextArena
HTML
PDF
Leon Guertler, Bobby Cheng, Simon Yu, Bo Liu, Leshem Choshen...
TL;DR
本研究解决了传统基准未能评估动态社交技能(如谈判、心智理论和欺骗)的问题,提供了一个开源的文本竞技游戏集合以训练和评估大语言模型的行为。通过57种独特的环境设置,TextArena允许用户轻松地进行模型能力的在线评估,旨在促进研究和社区的可扩展性与创新。
Abstract
TextArena is an open-source collection of competitive text-based games for training and evaluation of
Agentic Behavior
in
Large Language Models
(LLMs). It spans 57+ unique environments (including single-player, t
→