We revisit the classic problem of optimal subset selection in the online
learning set-up. Assume that the set $[N]$ consists of $N$ distinct elements.
On the $t$th round, an adversary chooses a monotone reward function $f_t:
2^{[N]} \to \mathbb{R}_+$ that assigns a non-negative reward to each subset of
$[N].$ An online policy selects (perhaps randomly) a subset $S_t \subseteq [N]$
consisting of $k$ elements before the reward function $f_t$ for the $t$th round
is revealed to the learner. As a consequence of its choice, the policy receives
a reward of $f_t(S_t)$ on the $t$th round. Our goal is to design an online
sequential subset selection policy to maximize the expected cumulative reward
accumulated over a time horizon. In this connection, we propose an online
learning policy called SCore (Subset Selection with Core) that solves the
problem for a large class of reward functions. The proposed SCore policy is
based on a new polyhedral characterization of the reward functions called
$\alpha$-Core - a generalization of Core from the cooperative game theory
literature. We establish a learning guarantee for the SCore policy in terms of
a new performance metric called $\alpha$-augmented regret. In this new metric,
the performance of the online policy is compared with an unrestricted offline
benchmark that can select all $N$ elements at every round. We show that a large
class of reward functions, including submodular, can be efficiently optimized
with the SCore policy. We also extend the proposed policy to the optimistic
learning set-up where the learner has access to additional untrusted hints
regarding the reward functions. Finally, we conclude the paper with a list of
open problems.

本研究提出了一个名为 SCore 的在线学习策略，用于解决一类奖励函数下的最优子集选择问题，并引入了一种新的性能度量标准，即 α- 增强遗憾。研究表明，包括子模函数在内的大类奖励函数，都可以通过 SCore 策略进行高效优化。