论文标题
Sigmorphon 2020共享的任务无监督的形态范式完成
The SIGMORPHON 2020 Shared Task on Unsupervised Morphological Paradigm Completion
论文作者
论文摘要
在本文中,我们描述了Sigmorphon 2020共享任务关于无监督的形态范式完成(Sigmorphon 2020 Task 2)的发现,这是拐点形态领域的新任务。要求参与者提交采用原始文本和引理清单作为输入的系统,并输出所有弯曲的形式,即每个引理的整个形态范式。为了模拟现实的用例,我们首先发布了5种开发语言的数据。但是,对9种令人惊讶的语言进行了正式评估,该系统仅在提交截止日期之前的几天才揭示。我们提供了一个模块化基线系统,该系统是4个组件的管道。 3个团队总共提交了7个系统,但是令人惊讶的是,在所有9种测试语言中,提交的系统平均无法改进基线。仅在3种语言上提交的系统获得了最佳结果。这表明,无监督的形态范式的完成仍未得到解决。我们在此处介绍分析,以便这项共同的任务将对该主题进行进一步的研究。
In this paper, we describe the findings of the SIGMORPHON 2020 shared task on unsupervised morphological paradigm completion (SIGMORPHON 2020 Task 2), a novel task in the field of inflectional morphology. Participants were asked to submit systems which take raw text and a list of lemmas as input, and output all inflected forms, i.e., the entire morphological paradigm, of each lemma. In order to simulate a realistic use case, we first released data for 5 development languages. However, systems were officially evaluated on 9 surprise languages, which were only revealed a few days before the submission deadline. We provided a modular baseline system, which is a pipeline of 4 components. 3 teams submitted a total of 7 systems, but, surprisingly, none of the submitted systems was able to improve over the baseline on average over all 9 test languages. Only on 3 languages did a submitted system obtain the best results. This shows that unsupervised morphological paradigm completion is still largely unsolved. We present an analysis here, so that this shared task will ground further research on the topic.