Temporally locating and classifying action segments in long untrimmed videos
is of particular interest to many applications like surveillance and robotics.
While traditional approaches follow a two-step pipeline, by generating
frame-wise probabilities and then feeding them to high-level temporal models,
recent approaches use temporal convolutions to directly