BriefGPT.xyz
Mar, 2024
Tur[k]ingBench: 网页代理挑战基准
Tur[k]ingBench: A Challenge Benchmark for Web Agents
HTML
PDF
Kevin Xu, Yeganeh Kordi, Kate Sanders, Yizhong Wang, Adam Byerly...
TL;DR
通过实验模拟了多模式预训练对网络页面的理解能力,在基准测试中发现了现有模型的优势和不足,并希望该基准测试能促进网络代理的评估和发展。
Abstract
Recent
chatbots
have demonstrated impressive ability to understand and communicate in raw-text form. However, there is more to the world than raw text. For example, humans spend long hours of their time on
web pages
→