We approach the problem of learning by watching humans in the wild. While
traditional approaches in Imitation and Reinforcement Learning are promising
for learning in the real world, they are either sample inefficient or are
constrained to lab settings. Meanwhile, there has been a lot of success in
processing passive, unstructured human data. We propose tack