CoTFormer：更多的注意力机制可以弥补深度不足

Oct, 2023

CoTFormer：更多的注意力机制可以弥补深度不足

CoTFormer: More Tokens With Attention Make Up For Less Depth

Amirkeivan Mohtashami, Matteo Pagliardini, Martin Jaggi

TL;DR利用Chain-of-Thought和CoT方法在transformer模型中，提出了CoTFormer，通过隐式CoT机制实现了与更深层次模型相当的能力，并在实证研究中证明了其在下游效果明显优于更大的标准transformers。

Abstract

The race to continually develop ever larger and deeper foundational models is underway. However, techniques like the chain-of-thought (CoT) method continue to play a pivotal role in achieving optimal downstream performa