BriefGPT.xyz
Feb, 2024
借助马尔科夫链的注意力:通过马尔科夫链分析Transformer的原则框架
Attention with Markov: A Framework for Principled Analysis of Transformers via Markov Chains
HTML
PDF
Ashok Vardhan Makkuva, Marco Bondaschi, Adway Girish, Alliot Nagle, Martin Jaggi...
TL;DR
通过马尔可夫链的角度研究变压器的序列建模能力,并在理论和实验上研究数据分布特性、变压器结构、学习分布和模型性能之间的相互作用。
Abstract
In recent years,
attention-based transformers
have achieved tremendous success across a variety of disciplines including natural languages. A key ingredient behind their success is the
generative pretraining procedure
→