非语言监督对句子嵌入的对比度学习

论文标题

非语言监督对句子嵌入的对比度学习

Non-Linguistic Supervision for Contrastive Learning of Sentence Embeddings

论文作者

Jian, Yiren, Gao, Chongyang, Vosoughi, Soroush

论文摘要

在NLP中，句子的语义表示学习是一个重要且研究的问题。此任务的当前趋势涉及通过与文本的对比目标进行培训基于变压器的句子编码器，即具有语义上相似含义并散布他人的聚类句子。在这项工作中，我们发现使用多模式多任务损失的训练可以改善变压器模型的性能，并使用其他模式（例如，句子和不相关的图像/音频数据）进行多模式多任务损失。特别是，除了通过文本的对比损失学习外，我们的模型簇还来自非语言域（例如，视觉/音频），同时具有相似的对比度损失。我们框架对未配对的非语言数据的依赖使IT语言不合时宜，从而使其在英语NLP之外广泛适用。在7个语义文本相似性基准上进行的实验表明，经过其他非语言（图像/音频）对比目标训练的模型可导致更高质量的句子嵌入。这表明变压器模型能够通过执行类似的任务（即聚类），并以多任务方式的不同模式的示例来更好地概括。

Semantic representation learning for sentences is an important and well-studied problem in NLP. The current trend for this task involves training a Transformer-based sentence encoder through a contrastive objective with text, i.e., clustering sentences with semantically similar meanings and scattering others. In this work, we find the performance of Transformer models as sentence encoders can be improved by training with multi-modal multi-task losses, using unpaired examples from another modality (e.g., sentences and unrelated image/audio data). In particular, besides learning by the contrastive loss on text, our model clusters examples from a non-linguistic domain (e.g., visual/audio) with a similar contrastive loss at the same time. The reliance of our framework on unpaired non-linguistic data makes it language-agnostic, enabling it to be widely applicable beyond English NLP. Experiments on 7 semantic textual similarity benchmarks reveal that models trained with the additional non-linguistic (images/audio) contrastive objective lead to higher quality sentence embeddings. This indicates that Transformer models are able to generalize better by doing a similar task (i.e., clustering) with unpaired examples from different modalities in a multi-task fashion.

下载PDF全文

下载文献需遵守相关版权规定

论文标题