In this paper, we study the actor-action semantic segmentation problem, which
requires joint labeling of both actor and action categories in video frames.
One major challenge for this task is that when an actor performs an action,
different body parts of the actor provide different typ