TL;DR本文探讨了如何使用无参数评估器来估计图像字幕生成中的状态值,并使用重构的优势函数进行 N 步训练,这种方法相对于 MSCOCO 数据集上的序列级优势和参数化估值方法能够取得更好的性能表现。
Abstract
Existing methods for image captioning are usually trained by cross entropy
loss, which leads to exposure bias and the inconsistency between the optimizing
function and evaluation metrics. Recently it has been shown that these two
issues can be addressed by incorporating techniques from