论文标题

零拍的传输语言选择

Transfer Language Selection for Zero-Shot Cross-Lingual Abusive Language Detection

论文作者

Eronen, Juuso, Ptaszynski, Michal, Masui, Fumito, Arata, Masaki, Leliwa, Gniewosz, Wroczynski, Michal

论文摘要

我们研究了自动滥用语言检测的转移语言的选择。我们没有为每种语言准备数据集,而是证明了跨语性转移学习对零射击语言检测的有效性。这样,我们可以使用高资源语言中的现有数据来为低资源语言构建更好的检测系统。我们的数据集来自来自三种语言系列的七种不同语言。我们使用多种语言相似性度量来测量语言之间的距离,尤其是通过量化语言结构的世界地图集。我们表明,语言相似性与分类器的性能之间存在相关性。这一发现使我们能够为零射击滥用语言检测选择一种最佳传输语言。

We study the selection of transfer languages for automatic abusive language detection. Instead of preparing a dataset for every language, we demonstrate the effectiveness of cross-lingual transfer learning for zero-shot abusive language detection. This way we can use existing data from higher-resource languages to build better detection systems for low-resource languages. Our datasets are from seven different languages from three language families. We measure the distance between the languages using several language similarity measures, especially by quantifying the World Atlas of Language Structures. We show that there is a correlation between linguistic similarity and classifier performance. This discovery allows us to choose an optimal transfer language for zero shot abusive language detection.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源