Building systems that achieve a deeper understanding of language is one of
the central goals of natural language processing (NLP). Towards this goal,
recent works have begun to train language models on narrative datasets which
require extracting the most critical information by integrating across long
contexts. However, it is still an open question whether these models are
learning a deeper understanding of the text, or if the models are simply
learning a heuristic to complete the task. This work investigates this further
by turning to the one language processing system that truly understands complex
language: the human brain. We show that training language models for deeper
narrative understanding results in richer representations that have improved
alignment to human brain activity. We further find that the improvements in
brain alignment are larger for character names than for other discourse
features, which indicates that these models are learning important narrative
elements. Taken together, these results suggest that this type of training can
indeed lead to deeper language understanding. These findings have consequences
both for cognitive neuroscience by revealing some of the significant factors
behind brain-NLP alignment, and for NLP by highlighting that understanding of
long-range context can be improved beyond language modeling.

训练自然语言处理系统以深入理解语言是该领域的中心目标之一。本文从人类大脑理解自然语言的角度出发，研究了使用叙述数据集进行深层次叙事理解训练的语言模型是否真正学习了更深层次的文本理解，并表明了这种训练可以带来更好的大脑 - 自然语言处理对齐性、可以使语言模型在长距离文本理解方面取得改进。

训练语言模型概括叙述能够提高大脑对齐

Training language models to summarize narratives improves brain alignment

Language models are generally trained on short, truncated input sequences,
which limits their ability to use discourse-level information present in
long-range context to improve their predictions. Recent efforts to improve the
efficiency of self-attention have led to a proliferation of long-range
Transformer language models, which can process much longer sequences than
models of the past. However, the ways in which such models take advantage of
the long-range context remain unclear. In this paper, we perform a fine-grained
analysis of two long-range Transformer language models (including the
\emph{Routing Transformer}, which achieves state-of-the-art perplexity on the
PG-19 long-sequence LM benchmark dataset) that accept input sequences of up to
8K tokens. Our results reveal that providing long-range context (i.e., beyond
the previous 2K tokens) to these models only improves their predictions on a
small set of tokens (e.g., those that can be copied from the distant context)
and does not help at all for sentence-level prediction tasks. Finally, we
discover that PG-19 contains a variety of different document types and domains,
and that long-range context helps most for literary novels (as opposed to
textbooks or magazines).

该研究分析了两个能够接受高达 8K Token 的长文本转换器语言模型，发现将长距离上下文提供给这些模型只会在少数 Token 上提高其预测能力（例如可以从远处文本中复制的 Token），对于句子级别的预测任务没有任何帮助；并且长范围上下文对文学小说的帮助最大。

长程语言模型是否实际上使用了长程上下文？

Do Long-Range Language Models Actually Use Long-Range Context?

In dialogues, an utterance is a chain of consecutive sentences produced by
one speaker which ranges from a short sentence to a thousand-word post. When
studying dialogues at the utterance level, it is not uncommon that an utterance
would serve multiple functions. For instance, "Thank you. It works great."
expresses both gratitude and positive feedback in the same utterance. Multiple
dialogue acts (DA) for one utterance breeds complex dependencies across
dialogue turns. Therefore, DA recognition challenges a model's predictive power
over long utterances and complex DA context. We term this problem Concurrent
Dialogue Acts (CDA) recognition. Previous work on DA recognition either assumes
one DA per utterance or fails to realize the sequential nature of dialogues. In
this paper, we present an adapted Convolutional Recurrent Neural Network (CRNN)
which models the interactions between utterances of long-range context. Our
model significantly outperforms existing work on CDA recognition on a tech
forum dataset.

本文主要研究了如何使用卷积 - 循环神经网络对对话中的连续性行为进行识别，解决了当前对于长篇对话和复杂对话情境下预测难度较大的问题，同时在技术论坛数据集上取得了更为优异的效果。