BriefGPT.xyz
Jun, 2018
贝叶斯对抗性风险最小化
Bayesian Counterfactual Risk Minimization
HTML
PDF
Ben London, Ted Sandler
TL;DR
提供了一种贝叶斯视角的数学方法,支持使用 logged bandit feedback 进行离线学习,提出了一种新的 generalization bound 来估算社会可接受的风险,并引入了一种新的正则化技术来避免过拟合。
Abstract
We present a
bayesian view
of
counterfactual risk minimization
(CRM), also known as offline policy optimization from
logged bandit feedback
→