The explore{exploit dilemma is one of the central challenges in Reinforcement
Learning (RL). bayesian rl solves the dilemma by providing the agent with
information in the form of a prior distribution over environments; however,
full Bayesian planning is intractable. Planning with the m