Although significant progress has been made in the field of automatic image
captioning, it is still a challenging task. Previous works normally pay much
attention to improving the quality of the generated captions but ignore the
diversity of captions. In this paper, we combine determinantal p