This paper strives to localize the temporal extent of an action in a long
untrimmed video. Where existing work leverages many examples with their start,
their ending, and/or the class of the action during training time, we propose
few-shot common action localization. The start and end