We study a general class of contextual bandits, where each context-action
pair is associated with a raw feature vector, but the reward generating
function is unknown. We propose a novel learning algorithm that transforms the
raw feature vector using the last hidden layer of a deep ReLU