Significant progress has been made in recent years in image captioning, an
active topic in the fields of vision and language. However, existing methods
tend to yield overly general captions and consist of some of the most frequent
words/phrases, resulting in inaccurate and indistinguis