论文标题
部分可观测时空混沌系统的无模型预测
EASE: Entity-Aware Contrastive Learning of Sentence Embedding
论文作者
论文摘要
我们表现出轻松的方法,这是一种通过句子及其相关实体之间的对比学习来学习句子嵌入的新方法。使用实体监督的优点是双重的:(1)实体已被证明是文本语义的有力指标,因此应为嵌入句子提供丰富的培训信号; (2)实体是独立于语言定义的,因此提供了有用的跨语性对准监督。我们在单语和多语言设置中对其他无监督的模型进行评估。我们表明,轻松在英语语义文本相似性(STS)和短文本聚类(STC)任务中表现出竞争性或更好的性能,并且在各种任务上的多语言设置中,它的表现明显优于基线方法。我们的源代码,预训练的模型和新建的多语言STC数据集可在https://github.com/studio-ousia/ease上找到。
We present EASE, a novel method for learning sentence embeddings via contrastive learning between sentences and their related entities. The advantage of using entity supervision is twofold: (1) entities have been shown to be a strong indicator of text semantics and thus should provide rich training signals for sentence embeddings; (2) entities are defined independently of languages and thus offer useful cross-lingual alignment supervision. We evaluate EASE against other unsupervised models both in monolingual and multilingual settings. We show that EASE exhibits competitive or better performance in English semantic textual similarity (STS) and short text clustering (STC) tasks and it significantly outperforms baseline methods in multilingual settings on a variety of tasks. Our source code, pre-trained models, and newly constructed multilingual STC dataset are available at https://github.com/studio-ousia/ease.