BriefGPT.xyz
Jul, 2024
星象馆:将文字转换为结构化规划语言的严格基准
Planetarium: A Rigorous Benchmark for Translating Text to Structured Planning Languages
HTML
PDF
Max Zuo, Francisco Piedrahita Velez, Xiaochen Li, Michael L. Littman, Stephen H. Bach
TL;DR
对于自然语言任务描述生成PDDL代码的能力,存在评估困难,因此引入了一个基准测试数据集benchmarkName,包括132,037个文本到PDDL对,通过对几种语言模型的评估表明了该任务的复杂性。
Abstract
Many recent works have explored using
language models
for planning problems. One line of research focuses on translating natural language descriptions of
planning tasks
into structured planning languages, such as
→