Utkarsh Upadhyay, Robert Busa-Fekete, Wojciech Kotlowski, David Pal, Balazs Szorenyi
TL;DR研究了在未知网页变化频率的情况下,使用部分可观察信号进行在线估计的 Web 抓取优化问题,并提出了实用的估计器,证明了探索 - 开发算法的性能。
Abstract
web crawling is the problem of keeping a cache of webpages fresh, i.e.,
having the most recent copy available when a page is requested. This problem is
usually coupled with the natural restriction that the bandwi