论文标题

XSTEM:基于示例的Stemming算法

XSTEM: An exemplar-based stemming algorithm

论文作者

Baker, Kirk

论文摘要

茎是通过从中取出词缀将相关词减少到标准形式的过程。现有算法因其复杂性,可配置性,未知单词的处理以及避免过度茎的能力而异。本文介绍了一种快速,简单,可配置的,高精度,高回调的词干算法,将基于单词的查找表的简单性和性能与基于规则的方法的强大概括性相结合,以避免出现量不足的词外单词。

Stemming is the process of reducing related words to a standard form by removing affixes from them. Existing algorithms vary with respect to their complexity, configurability, handling of unknown words, and ability to avoid under- and over-stemming. This paper presents a fast, simple, configurable, high-precision, high-recall stemming algorithm that combines the simplicity and performance of word-based lookup tables with the strong generalizability of rule-based methods to avert problems with out-of-vocabulary words.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源