Jiawei Huang, Li Zhao, Tao Qin, Wei Chen, Nan Jiang...
TL;DR提出了一个学习框架,该框架使用两个算法与多层次结构的用户交互应用程序中的用户进行分组,以分别处理他们的不同探索风险容忍度,并研究了将Pessimistic Value Iteration作为利用算法的应用。
Abstract
We propose a new learning framework that captures the tiered structure of many real-world user-interaction applications, where the users can be divided into two groups based on their different tolerance on exploration risks and should be treated separately. In this setting, we simultan