To predict the next token, autoregressive models ordinarily examine the past.
Could they also benefit from also examining hypothetical futures? We consider a
novel Transformer-based autoregressive architecture that estimates the
next-token distribution by extrapolating multiple continuations of the past,
according to some proposal distribution, and attending to these extended
strings. This architecture draws insights from classical AI systems such as
board game players: when making a local decision, a policy may benefit from
exploring possible future trajectories and analyzing them. On multiple tasks
including morphological inflection and Boolean satisfiability, our lookahead
model is able to outperform the ordinary Transformer model of comparable size.
However, on some tasks, it appears to be benefiting from the extra computation
without actually using the lookahead information. We discuss possible variant
architectures as well as future speedups.

本文介绍了一种新颖的基于 Transformer 的自回归架构，通过根据某些提议分布外推过去的多个连续部分来估计下一个令牌的分布，并关注这些扩展字符串，以改进自回归模型的性能。