Predicting upcoming events is critical to our ability to interact with our
environment. Transformer models, trained on next-word prediction, appear to
construct representations of linguistic input that can support diverse
downstream tasks. But how does a predictive objective shape such
representations? Inspired by recent work in vision (Henaff et al., 2019), we
test a hypothesis about predictive representations of autoregressive
transformers. In particular, we test whether the neural trajectory of a
sentence becomes progressively straighter as it passes through the network
layers. The key insight is that straighter trajectories should facilitate
prediction via linear extrapolation. We quantify straightness using a
1-dimensional curvature metric, and present four findings in support of the
trajectory straightening hypothesis: i) In trained models, the curvature
decreases from the early to the deeper layers of the network. ii) Models that
perform better on the next-word prediction objective exhibit greater decreases
in curvature, suggesting that this improved ability to straighten sentence
trajectories may be the driver of better language modeling performance. iii)
Given the same linguistic context, the sequences that are generated by the
model have lower curvature than the actual continuations observed in a language
corpus, suggesting that the model favors straighter trajectories for making
predictions. iv) A consistent relationship holds between the average curvature
and the average surprisal of sentences in the deep model layers, such that
sentences with straighter trajectories also have lower surprisal. Importantly,
untrained models do not exhibit these behaviors. In tandem, these results
support the trajectory straightening hypothesis and provide a possible
mechanism for how the geometry of the internal representations of
autoregressive models supports next word prediction.

用于预测的自回归变换器的预测表示通过逐渐变得更加直线化来实现更好的语言建模性能，并与句子的惊异程度之间存在一致的关系。