通过学习排名，可以检测细粒度的跨语性语义分歧，而无需监督

论文标题

通过学习排名，可以检测细粒度的跨语性语义分歧，而无需监督

Detecting Fine-Grained Cross-Lingual Semantic Divergences without Supervision by Learning to Rank

论文作者

Briakou, Eleftheria, Carpuat, Marine

论文摘要

检测以不同语言传达的内容的细粒度差异对于跨语性NLP和多语言语料库分析很重要，但是这是一个具有挑战性的机器学习问题，因为注释昂贵且难以扩展。这项工作改善了细粒语义差异的预测和注释。我们通过学习对粒度变化的综合差异示例进行排名，为多语言BERT模型介绍了培训策略。我们评估了我们的模型在合理的英语语义差异上，这是一个新的数据集，该数据集用这项工作发布，由用语义差异类和令牌级别的理由组成的英语 - 法语句子对。与强句级相似性模型相比，学习排名有助于更准确地检测细粒度级别的差异模型，而令牌级别的预测有可能进一步区分粗糙和细粒度的分歧。

Detecting fine-grained differences in content conveyed in different languages matters for cross-lingual NLP and multilingual corpora analysis, but it is a challenging machine learning problem since annotation is expensive and hard to scale. This work improves the prediction and annotation of fine-grained semantic divergences. We introduce a training strategy for multilingual BERT models by learning to rank synthetic divergent examples of varying granularity. We evaluate our models on the Rationalized English-French Semantic Divergences, a new dataset released with this work, consisting of English-French sentence-pairs annotated with semantic divergence classes and token-level rationales. Learning to rank helps detect fine-grained sentence-level divergences more accurately than a strong sentence-level similarity model, while token-level predictions have the potential of further distinguishing between coarse and fine-grained divergences.

下载PDF全文

下载文献需遵守相关版权规定

论文标题