We consider a sequential assortment selection problem where the user choice
is given by a multinomial logit (MNL) choice model whose parameters are
unknown. In each period, the learning agent observes a $d$-dimensional
contextual information about the user and the $N$ available items, and offers
an assortment of size $K$ to the user, and observes the bandit