带有赌博反馈的非随机控制

Aug, 2020

Non-Stochastic Control with Bandit Feedback

Paula Gradu, John Hallman, Elad Hazan

TL;DR本文研究了控制具有对抗扰动的线性动态系统的问题，其中控制器仅有可用的标量损失反馈，且损失函数本身未知。针对这个问题，无论系统是否知道，我们都提出了一个有效的次线性后悔算法，并提出了一种用于带有记忆的损失函数的通用带贝叶斯优化算法，这可能是独立学科领域的一个难点。

Abstract

We study the problem of controlling a linear dynamical system with adversarial perturbations where the only feedback available to the controller is the scalar loss, and the →