To enable machines to learn how humans interact with the physical world in our daily activities, it is crucial to provide rich data that encompasses the 3d motion of humans as well as the motion of objects in a learnable 3D representation. Ideally, this data should be collected in a na