In this paper, we propose Cross-Thought, a novel approach to pre-training sequence encoder, which is instrumental in building reusable sequence embeddings for large-scale NLP tasks such as question answering. Instead of using the original signals of full sentences, we train a Transformer-based sequence encoder over a large set of short sequences, which allows the model to automatically select the most useful information for predicting masked words. Experiments on question answering and textual entailment tasks demonstrate that our pre-trained encoder can outperform state-of-the-art encoders trained with continuous sentence signals as well as traditional masked language modeling baselines. Our proposed approach also achieves new state of the art on HotpotQA (full-wiki setting) by improving intermediate information retrieval performance.

这篇论文提出了Cross-Thought方法用以预训练序列编码器，通过大规模的短序列训练Transformer-based序列编码器来自动选择对预测掩码词最有用的信息，用于大规模自然语言处理任务，如问答，文本推断等，实验结果表明，所提出的方法比传统基于连续句子信号的最新编码器以及传统掩码语言模型基线更加优秀，并打破了HotpotQA (full-wiki setting)的最新记录，取得了新的最高水平的中间信息检索表现。

句子编码器预训练的交叉思维