BriefGPT.xyz
Oct, 2020
贝叶斯注意力模块
Bayesian Attention Modules
HTML
PDF
Xinjie Fan, Shujian Zhang, Bo Chen, Mingyuan Zhou
TL;DR
本研究提出了一种易于实现和优化的可伸缩的随机注意力版本,其特点是通过归一化可重参数化分布来构造单纯限制的注意力分布,并在基于数据的先验框架中学习其参数进行正则化,将该方法应用于各种注意力模型中,并在图形节点分类、视觉问答、图像字幕生成、机器翻译、语言理解等领域获得了一致的改进。
Abstract
attention modules
, as simple and effective tools, have not only enabled
deep neural networks
to achieve state-of-the-art results in many domains, but also enhanced their interpretability. Most current models use
→