Language Models (LMs) often struggle with linguistic understanding at the
discourse level, even though discourse patterns such as coherence, cohesion,
and narrative flow are prevalent in their pre-training data. Current methods
address these challenges only after the pre-training phase, relying on
expensive human annotated data to align the model. To improve the discourse
capabilities of LMs already at the pre-training stage, we introduce DEPTH, an
encoder-decoder model that learns to represent sentences using a
discourse-oriented pre-training objective. DEPTH combines hierarchical sentence
representations with two objectives: (1) Sentence Un-Shuffling, and (2)
Span-Corruption. This approach trains the model to represent both
sub-word-level and sentence-level dependencies over a massive amount of
unstructured text. When trained either from scratch or continuing from a
pre-trained T5 checkpoint, DEPTH learns semantic and discourse-level
representations faster than T5, outperforming it in span-corruption loss
despite the additional sentence-un-shuffling objective. Evaluations on the
GLUE, DiscoEval, and NI benchmarks demonstrate DEPTH's ability to quickly learn
diverse downstream tasks, which require syntactic, semantic, and discourse
capabilities. Overall, our approach extends the discourse capabilities of T5,
while minimally impacting other natural language understanding (NLU)
capabilities in the resulting LM.

深度是一个编码器 - 解码器模型，通过在预训练过程中引入面向语篇的目标来提高语言模型在语篇层面上的理解能力。通过结合层次化的句子表示和两个目标：句子重排和跨度破坏，深度能够更快地学习语义和语篇级别的表示，从而拓展了 T5 在语篇能力方面的表现。

DEPTH：分层预训练的议程教育

DEPTH: Discourse Education through Pre-Training Hierarchically

Available corpora for Argument Mining differ along several axes, and one of
the key differences is the presence (or absence) of discourse markers to signal
argumentative content. Exploring effective ways to use discourse markers has
received wide attention in various discourse parsing tasks, from which it is
well-known that discourse markers are strong indicators of discourse relations.
To improve the robustness of Argument Mining systems across different genres,
we propose to automatically augment a given text with discourse markers such
that all relations are explicitly signaled. Our analysis unveils that popular
language models taken out-of-the-box fail on this task; however, when
fine-tuned on a new heterogeneous dataset that we construct (including
synthetic and real examples), they perform considerably better. We demonstrate
the impact of our approach on an Argument Mining downstream task, evaluated on
different corpora, showing that language models can be trained to automatically
fill in discourse markers across different corpora, improving the performance
of a downstream model in some, but not all, cases. Our proposed approach can
further be employed as an assistive tool for better discourse understanding.

为了提高不同类型文本中的 Argument Mining 系统的鲁棒性，我们提出了自动使用连贯性标记来增加输入文本的方法，以便明确标记所有关系，并发现即使是现成的最流行的语言模型在这项任务上也会失败。

跨文本类型论证挖掘：语言模型自动填补缺失的话语标记？

Cross-Genre Argument Mining: Can Language Models Automatically Fill in  Missing Discourse Markers?

Keeping track of how states and relations of entities change as a text or
dialog unfolds is a key prerequisite to discourse understanding. Despite this
fact, there have been few systematic investigations into the ability of large
language models (LLMs) to track discourse entities. In this work, we present a
task to probe to what extent a language model can infer the final state of an
entity given an English description of the initial state and a series of
state-changing operations. We use this task to first investigate whether
Flan-T5, GPT-3 and GPT-3.5 can track the state of entities, and find that only
GPT-3.5 models, which have been pretrained on large amounts of code, exhibit
this ability. We then investigate whether smaller models pretrained primarily
on text can learn to track entities, through finetuning T5 on several
training/evaluation splits. While performance degrades for more complex splits,
we find that even for splits with almost no lexical overlap between training
and evaluation, a finetuned model can often perform non-trivial entity
tracking. Taken together, these results suggest that language models can learn
to track entities but pretraining on large text corpora alone does not make
this capacity surface.

本文探讨了大语言模型在跟踪实体状态和关系变化方面的能力，发现只有预训练于大量代码的 GPT-3.5 模型具有此能力，而使用预训练于文本的较小模型进行微调后也可以完成一定程度的实体追踪。但这种能力不仅取决于模型的大小，大文本库的预训练也不是必要条件。