contextual bandit algorithms are at the core of many applications, including
recommender systems, clinical trials, and optimal portfolio selection. One of
the most popular problems studied in the contextual bandit literature is to
maximize the sum of the rewards in each round by ensuri