论文标题
对越南语的言论一部分标签者的实验研究
An Experimental Investigation of Part-Of-Speech Taggers for Vietnamese
论文作者
论文摘要
词性(POS)标记的一部分在自然语言处理(NLP)中起着重要作用。它的应用程序可以在许多NLP任务中找到,例如命名实体识别,句法解析,依赖性解析和文本块。在本文中进行的调查中,我们利用了两个广泛使用的工具包的技术,即Clearnlp和Stanford Pos Tagger,以及为越南人开发了两个新的POS标记,然后将它们与三个著名的越南标签者进行了比较,即JVntagger,Vntagger,Vntagger,Vntagger和Rdrpopstagger。我们进行系统的比较,以找出具有最佳性能的标签器。我们还设计了一个新功能集来衡量统计标签者的性能。我们的新标签者是由Stanford Tagger和具有新功能集的ClearNLP构建的,可以在标记准确性方面胜过所有其他当前的越南标签。此外,我们还分析了某些特征对统计标签者的性能的感情。最后,实验结果还表明,基于转换的标签器Rdrpostagger的运行速度明显比任何其他统计标签器快得多。
Part-of-speech (POS) tagging plays an important role in Natural Language Processing (NLP). Its applications can be found in many NLP tasks such as named entity recognition, syntactic parsing, dependency parsing and text chunking. In the investigation conducted in this paper, we utilize the technologies of two widely-used toolkits, ClearNLP and Stanford POS Tagger, as well as develop two new POS taggers for Vietnamese, then compare them to three well-known Vietnamese taggers, namely JVnTagger, vnTagger and RDRPOSTagger. We make a systematic comparison to find out the tagger having the best performance. We also design a new feature set to measure the performance of the statistical taggers. Our new taggers built from Stanford Tagger and ClearNLP with the new feature set can outperform all other current Vietnamese taggers in term of tagging accuracy. Moreover, we also analyze the affection of some features to the performance of statistical taggers. Lastly, the experimental results also reveal that the transformation-based tagger, RDRPOSTagger, can run significantly faster than any other statistical tagger.