使用多滤波器体系结构在Twitter中的口语方言标识

论文标题

使用多滤波器体系结构在Twitter中的口语方言标识

Spoken dialect identification in Twitter using a multi-filter architecture

论文作者

Banaei, Mohammadreza, Lebret, Rémi, Aberer, Karl

论文摘要

本文介绍了我们对Swisstext＆Konvens 2020共享任务2的方法，这是Twitter上瑞士德语（GSW）识别的多阶段神经模型。我们的模型输出GSW或非GSW，并不应用作通用语言标识符。我们的体系结构由两个独立的过滤器组成，其中第一个有利于回忆，第二个过滤器有利于精度（均针对GSW）。此外，我们在过滤器中不使用二进制模型（GSW与非GSW），而是使用GSW是可能的标签之一的多级分类器。在共享任务的测试集中，我们的模型达到0.982的F1得分。

This paper presents our approach for SwissText & KONVENS 2020 shared task 2, which is a multi-stage neural model for Swiss German (GSW) identification on Twitter. Our model outputs either GSW or non-GSW and is not meant to be used as a generic language identifier. Our architecture consists of two independent filters where the first one favors recall, and the second one filter favors precision (both towards GSW). Moreover, we do not use binary models (GSW vs. not-GSW) in our filters but rather a multi-class classifier with GSW being one of the possible labels. Our model reaches F1-score of 0.982 on the test set of the shared task.

下载PDF全文

下载文献需遵守相关版权规定

论文标题