BriefGPT.xyz
Jan, 2021
多模态食谱中程序概念的潜在对齐
Latent Alignment of Procedural Concepts in Multimodal Recipes
HTML
PDF
Hossein Rajaby Faghihi, Roshanak Mirzaee, Sudarshan Paliwal, Parisa Kordjamshidi
TL;DR
本研究提出了一种新的方案,使用注意力机制、跨模态表示和指令和候选答案之间的潜在对齐空间来解决包含图像和指令的任务的语境推理问题,结果表明其优于基线的19%。
Abstract
We propose a novel alignment mechanism to deal with
procedural reasoning
on a newly released
multimodal qa dataset
, named
recipeqa
. Our mo
→