BriefGPT.xyz
Oct, 2023
CoTFormer:更多的注意力机制可以弥补深度不足
CoTFormer: More Tokens With Attention Make Up For Less Depth
HTML
PDF
Amirkeivan Mohtashami, Matteo Pagliardini, Martin Jaggi
TL;DR
利用Chain-of-Thought和CoT方法在transformer模型中,提出了CoTFormer,通过隐式CoT机制实现了与更深层次模型相当的能力,并在实证研究中证明了其在下游效果明显优于更大的标准transformers。
Abstract
The race to continually develop ever larger and deeper foundational models is underway. However, techniques like the
chain-of-thought
(CoT) method continue to play a pivotal role in achieving optimal
downstream performa
→