We study the problem of robust reinforcement learning under adversarial
corruption on both rewards and transitions. Our attack model assumes an
\textit{adaptive} adversary who can arbitrarily corrupt the reward and
transition at every step within an episode, for at most $\epsilon$-frac