用对比句子目标进行预处理改善语言模型的话语表现

论文标题

用对比句子目标进行预处理改善语言模型的话语表现

Pretraining with Contrastive Sentence Objectives Improves Discourse Performance of Language Models

论文作者

Iter, Dan, Guu, Kelvin, Lansing, Larry, Jurafsky, Dan

论文摘要

无监督表示文本学习的最新模型采用了许多技术来改善上下文单词表示形式，但很少关注话语级别的表示。我们提出了Conpono，这是一个句子间的目标，用于训练语言模型，该模型建模说话连贯性和句子之间的距离。鉴于锚定句，我们的模型经过训练，可以使用采样的softmax目标来预测文本k句子，其中候选者由随机从语料库进行随机采样的相邻句子和句子组成。关于话语表示基准，我们的模型在7个任务中，我们的模型在先前的最新面前提高了13％，绝对4％的绝对值为4％。我们的模型与Bert-Base的大小相同，但表现优于更大的BERT模型和其他包含话语的最新方法。我们还表明，即使对于未明确评估话语的任务，康波诺的绝对收益也达到了2％-6％的绝对值：文本构成（RTE），常识推理（COPA）和阅读理解（记录）。

Recent models for unsupervised representation learning of text have employed a number of techniques to improve contextual word representations but have put little focus on discourse-level representations. We propose CONPONO, an inter-sentence objective for pretraining language models that models discourse coherence and the distance between sentences. Given an anchor sentence, our model is trained to predict the text k sentences away using a sampled-softmax objective where the candidates consist of neighboring sentences and sentences randomly sampled from the corpus. On the discourse representation benchmark DiscoEval, our model improves over the previous state-of-the-art by up to 13% and on average 4% absolute across 7 tasks. Our model is the same size as BERT-Base, but outperforms the much larger BERT- Large model and other more recent approaches that incorporate discourse. We also show that CONPONO yields gains of 2%-6% absolute even for tasks that do not explicitly evaluate discourse: textual entailment (RTE), common sense reasoning (COPA) and reading comprehension (ReCoRD).

下载PDF全文

下载文献需遵守相关版权规定

论文标题