We study the incentivized information acquisition problem, where a principal
hires an agent to gather information on her behalf. Such a problem is modeled
as a Stackelberg game between the principal and the agent, where the principal
announces a scoring rule that specifies the payment, and then the agent then
chooses an effort level that maximizes her own profit and reports the
information. We study the online setting of such a problem from the principal's
perspective, i.e., designing the optimal scoring rule by repeatedly interacting
with the strategic agent. We design a provably sample efficient algorithm that
tailors the UCB algorithm (Auer et al., 2002) to our model, which achieves a
sublinear $T^{2/3}$-regret after $T$ iterations. Our algorithm features a
delicate estimation procedure for the optimal profit of the principal, and a
conservative correction scheme that ensures the desired agent's actions are
incentivized. Furthermore, a key feature of our regret bound is that it is
independent of the number of states of the environment.

研究了由 Stackelberg 博弈建模的信息获取问题，设计了样本有效的算法来优化评分规则，并保证了代理的行为得到激励，且无关环境状态数的遗憾值上界为 T 的两到三次方根。