论文标题

结合了低资源语言的预处理的高资源嵌入和子词表示

Combining Pretrained High-Resource Embeddings and Subword Representations for Low-Resource Languages

论文作者

Reid, Machel, Marrese-Taylor, Edison, Matsuo, Yutaka

论文摘要

在非洲语言的情况下,对当前自然语言处理(NLP)技术的大量数据(NLP)技术(NLP)技术的需求与缺乏的对比是强调的,其中大多数语言被认为是低资源的。为了帮助解决这个问题,我们探索了利用形态丰富语言(MRL)的质量的技术,同时利用了资源丰富的语言进行预处理的单词矢量。在我们的探索中,我们证明了一种结合审计和形态意识的单词嵌入在Xhosa-English翻译的下游任务中表现最佳的元装置方法。

The contrast between the need for large amounts of data for current Natural Language Processing (NLP) techniques, and the lack thereof, is accentuated in the case of African languages, most of which are considered low-resource. To help circumvent this issue, we explore techniques exploiting the qualities of morphologically rich languages (MRLs), while leveraging pretrained word vectors in well-resourced languages. In our exploration, we show that a meta-embedding approach combining both pretrained and morphologically-informed word embeddings performs best in the downstream task of Xhosa-English translation.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源