We consider a contextual combinatorial bandit problem where in each round a
learning agent selects a subset of arms and receives feedback on the selected
arms according to their scores. The score of an arm is an unknown function of
the arm's feature. Approximating this unknown score fu