域嵌入的域改编的一种简单方法

论文标题

域嵌入的域改编的一种简单方法

A simple method for domain adaptation of sentence embeddings

论文作者

Kruspe, Anna

论文摘要

预先训练的句子嵌入已被证明对各种NLP任务非常有用。由于训练此类嵌入需要大量数据，因此通常对各种文本数据进行培训。在许多情况下，对特定领域的适应可以改善结果，但是这种填充通常是问题依赖性的，并且会带来过度适应用于适应的数据的风险。在本文中，我们提出了一种简单的通用方法，用于使用暹罗体系结构对Google的通用句子编码器（使用）进行填充。我们演示了如何将此方法用于各种数据集，并在代表类似问题的不同数据集上呈现结果。该方法还与这些数据集的传统填充进行了比较。作为进一步的优势，该方法可用于将数据集与不同的注释组合。我们还对所有数据集并行提出了一个嵌入式填充。

Pre-trained sentence embeddings have been shown to be very useful for a variety of NLP tasks. Due to the fact that training such embeddings requires a large amount of data, they are commonly trained on a variety of text data. An adaptation to specific domains could improve results in many cases, but such a finetuning is usually problem-dependent and poses the risk of over-adapting to the data used for adaptation. In this paper, we present a simple universal method for finetuning Google's Universal Sentence Encoder (USE) using a Siamese architecture. We demonstrate how to use this approach for a variety of data sets and present results on different data sets representing similar problems. The approach is also compared to traditional finetuning on these data sets. As a further advantage, the approach can be used for combining data sets with different annotations. We also present an embedding finetuned on all data sets in parallel.

下载PDF全文

下载文献需遵守相关版权规定

论文标题