BriefGPT.xyz
Feb, 2025
多奖励多策略评估的自适应探索
Adaptive Exploration for Multi-Reward Multi-Policy Evaluation
HTML
PDF
Alessio Russo, Aldo Pacchiano
TL;DR
本研究解决了在线多奖励多策略折扣设置中的政策评估问题,首次针对同时评估多个奖励函数提出了$(\epsilon,\delta)$-PAC视角。通过采用改进的MR-NaS探索方案,我们实现了在不同奖励集上评估不同策略时样本复杂度的联合最小化,实验结果展示了该自适应探索方案的有效性。
Abstract
We study the
Policy Evaluation
problem in an online
Multi-Reward
multi-policy discounted setting, where multiple reward functions must be evaluated simultaneously for different policies. We adopt an $(\epsilon,\d
→