论文标题

评估基于变压器的多语言文本分类

Evaluating Transformer-Based Multilingual Text Classification

论文作者

Groenwold, Sophie, Honnavalli, Samhita, Ou, Lily, Parekh, Aesha, Levy, Sharon, Mirza, Diba, Wang, William Yang

论文摘要

随着NLP工具在当今的技术环境中变得无处不在,它们越来越多地应用于具有多种类型学结构的语言。但是,NLP研究并未主要关注其对最先进语言模型的分析中的类型学差异。结果,NLP工具在具有不同句法和形态结构的语言上表现不平等。通过对单词顺序类型学,形态类型和比较语言学的详细讨论,我们确定哪些变量影响了语言建模功效;此外,我们计算单词顺序和形态相似性指数,以帮助我们的实证研究。然后,我们使用此背景来支持我们对八种语言和八种模型的多级文本分类进行实验的分析。

As NLP tools become ubiquitous in today's technological landscape, they are increasingly applied to languages with a variety of typological structures. However, NLP research does not focus primarily on typological differences in its analysis of state-of-the-art language models. As a result, NLP tools perform unequally across languages with different syntactic and morphological structures. Through a detailed discussion of word order typology, morphological typology, and comparative linguistics, we identify which variables most affect language modeling efficacy; in addition, we calculate word order and morphological similarity indices to aid our empirical study. We then use this background to support our analysis of an experiment we conduct using multi-class text classification on eight languages and eight models.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源