BriefGPT.xyz
Jul, 2019
R-Transformer:加强循环神经网络的Transformer模型
R-Transformer: Recurrent Neural Network Enhanced Transformer
HTML
PDF
Zhiwei Wang, Yao Ma, Zitao Liu, Jiliang Tang
TL;DR
本文提出了R-Transformer模型,结合了RNN和多头注意力机制的优点,同时避免了它们各自的缺点,能够在不使用位置嵌入的情况下有效捕捉序列中的本地结构和全局长期依赖关系。通过广泛的实验评估表明,在大多数任务中,R-Transformer优于最先进的方法。
Abstract
recurrent neural networks
have long been the dominating choice for
sequence modeling
. However, it severely suffers from two issues: impotent in capturing very long-term dependencies and unable to parallelize the
→