Given the difficulty of manually annotating motion in video, the current best
motion estimation methods are trained with synthetic data, and therefore
struggle somewhat due to a train/test gap. Self-supervised methods hold the
promise of training directly on real video, but typically p