TL;DRWebCanvas 是一种创新的在线评估框架,用于有效解决 Web 交互的动态特性,包含评估指标、基准数据集和注释工具,并开源了可进行在线推理和评估的代理框架。
Abstract
For web agents to be practically useful, they must adapt to the continuously
evolving web environment characterized by frequent updates to user interfaces
and content. However, most existing benchmarks only capture the static aspects
of the web. To bridge this gap, we introduce WebCanv