评估梵语的神经形态标记

论文标题

评估梵语的神经形态标记

Evaluating Neural Morphological Taggers for Sanskrit

论文作者

Gupta, Ashim, Krishna, Amrith, Goyal, Pawan, Hellwig, Oliver

论文摘要

神经序列标记方法已实现了形态标记的最新水平。我们评估了四个标准序列标记模型在梵语中的功效，梵语是一种形态上丰富的融合印度语言。由于其标签空间理论上可以包含40,000多个标签，因此明确对标签的内部结构进行建模的系统更适合该任务，因为它们能够将其推广到训练期间看不到的标签。我们发现，尽管某些神经模型的表现要好于其他模型，但所有这些模型的常见原因之一是由于合成性而引起的错误预测。

Neural sequence labelling approaches have achieved state of the art results in morphological tagging. We evaluate the efficacy of four standard sequence labelling models on Sanskrit, a morphologically rich, fusional Indian language. As its label space can theoretically contain more than 40,000 labels, systems that explicitly model the internal structure of a label are more suited for the task, because of their ability to generalise to labels not seen during training. We find that although some neural models perform better than others, one of the common causes for error for all of these models is mispredictions due to syncretism.

下载PDF全文

下载文献需遵守相关版权规定

论文标题