The goal of this work is to recognise and localise short temporal signals in
image time series, where strong supervision is not available for training.
To this end we propose an image encoding that concisely represents human
motion in a video sequence in a form that is suitable for lea