The objective of many real-world tasks is complex and difficult to
procedurally specify. This makes it necessary to use reward or imitation
learning algorithms to infer a reward or policy directly from human data.
Existing benchmarks for these algorithms focus on realism, testing in complex
environments. Unfortunately, these benchmarks are slow, unreliable a