We present a study on a repeated delegated choice problem, which is the first to consider an online learning variant of Kleinberg and Kleinberg, EC'18. In this model, a principal interacts repeatedly with an agent who possesses an exogenous set of solutions to search for efficient ones. Each solution can yield varying utility for both the principal and the agent, and the agent may propose a solution to maximize its own utility in a selfish manner. To mitigate this behavior, the principal announces an eligible set which screens out a certain set of solutions. The principal, however, does not have any information on the distribution of solutions in advance. Therefore, the principal dynamically announces various eligible sets to efficiently learn the distribution. The principal's objective is to minimize cumulative regret compared to the optimal eligible set in hindsight. We explore two dimensions of the problem setup, whether the agent behaves myopically or strategizes across the rounds, and whether the solutions yield deterministic or stochastic utility. Our analysis mainly characterizes some regimes under which the principal can recover the sublinear regret, thereby shedding light on the rise and fall of the repeated delegation procedure in various regimes.

我们针对重复委托选择问题进行了研究，首次考虑了Kleinberg and Kleinberg, EC'18的在线学习变体。在这个模型中，一个负责人与一个拥有外生解集的代理人反复交互，以寻找有效的解。每个解对负责人和代理人都有不同的效用，并且代理人可能以一种自私的方式提出解，以最大化自己的效用。为了减轻这种行为，负责人宣布了一个合适的集合，筛选出一定的解集。然而，负责人事先对解集的分布没有任何信息。因此，负责人会动态地宣布各种合适的集合以有效地学习分布。负责人的目标是最小化与在事后的最优解集相比的累积后悔。我们探讨了问题设置的两个维度：代理人是否表现为短视行为或在多个轮次中进行策略化，并且解是否产生确定性或随机效用。我们的分析主要对一些情况进行了表征，从而揭示了重复委托过程在不同情况下的兴衰。

重复委派选择的遗憾分析