video paragraph captioning aims to describe multiple events in untrimmed
videos with descriptive paragraphs. Existing approaches mainly solve the
problem in two steps: event detection and then event captioning. Such two-step
manner makes the quality of generated paragraphs highly depen