The key assumption underlying linear markov decision processes (MDPs) is that
the learner has access to a known feature map $\phi(x, a)$ that maps
state-action pairs to $d$-dimensional vectors, and that the rewards and
transitions are linear functions in this representation. But where