AMHARIC言论部分标记的机器学习方法

论文标题

AMHARIC言论部分标记的机器学习方法

Machine Learning Approaches for Amharic Parts-of-speech Tagging

论文作者

Gashaw, Ibrahim, Shashirekha, H L.

论文摘要

语音部分（POS）标记被认为是许多自然语言处理（NLP）应用所需的基本但必要的工具之一，例如Word Sense歧义歧义，信息检索，信息处理，解析，问题答案和机器翻译。当前在Amharic中的POS标签者的性能不如使用英语和其他欧洲语言的当代POS标记器。这项工作的目的是改善Amharic语言的POS标签性能，而Amharic语言从未超过91％。已经检查了形态知识的使用，现有注释数据的扩展，特征提取，通过应用网格搜索来调整参数调整以及标记算法的使用，并与以前的作品获得了显着的性能差异。我们已经使用了三个不同的数据集进行POS实验。

Part-of-speech (POS) tagging is considered as one of the basic but necessary tools which are required for many Natural Language Processing (NLP) applications such as word sense disambiguation, information retrieval, information processing, parsing, question answering, and machine translation. Performance of the current POS taggers in Amharic is not as good as that of the contemporary POS taggers available for English and other European languages. The aim of this work is to improve POS tagging performance for the Amharic language, which was never above 91%. Usage of morphological knowledge, an extension of the existing annotated data, feature extraction, parameter tuning by applying grid search and the tagging algorithms have been examined and obtained significant performance difference from the previous works. We have used three different datasets for POS experiments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题