论文标题

无监督的Pidgin文本生成通过旋转英语数据和自我培训

Unsupervised Pidgin Text Generation By Pivoting English Data and Self-Training

论文作者

Chang, Ernie, Adelani, David Ifeoluwa, Shen, Xiaoyu, Demberg, Vera

论文摘要

西非Pidgin英语是一种在西非大量使用的语言,由至少7500万发言人组成。然而,几乎没有针对Pidgin英语的适当的机器翻译系统和相关的NLP数据集。在这项工作中,我们开发了针对自然语言产生的pidgin英语和英语之间差距的技术。 %作为概念证明,我们探索了数据之间生成领域中提出的技术。通过构建先前发布的单语pidgin英语文本和平行的英语数据对文本语料库,我们希望构建一个可以自动从结构化数据中生成Pidgin英语描述的系统。在使用无监督的神经机器翻译和自我训练的技术之前,我们首先训练一个数据对英语文本生成系统,以建立Pidgin到英语的跨语言对准。对生成的pidgin文本进行的人类评估表明,尽管实际上远非可用,但旋转 +自我训练技术却提高了Pidgin文本的流利性和相关性。

West African Pidgin English is a language that is significantly spoken in West Africa, consisting of at least 75 million speakers. Nevertheless, proper machine translation systems and relevant NLP datasets for pidgin English are virtually absent. In this work, we develop techniques targeted at bridging the gap between Pidgin English and English in the context of natural language generation. %As a proof of concept, we explore the proposed techniques in the area of data-to-text generation. By building upon the previously released monolingual Pidgin English text and parallel English data-to-text corpus, we hope to build a system that can automatically generate Pidgin English descriptions from structured data. We first train a data-to-English text generation system, before employing techniques in unsupervised neural machine translation and self-training to establish the Pidgin-to-English cross-lingual alignment. The human evaluation performed on the generated Pidgin texts shows that, though still far from being practically usable, the pivoting + self-training technique improves both Pidgin text fluency and relevance.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源