dense captioning aims at simultaneously localizing semantic regions and
describing these regions-of-interest (ROIs) with short phrases or sentences in
natural language. Previous studies have shown remarkable progresses, but they
are often vulnerable to the aperture problem that a capti