论文标题
任务的映射过程:Wikidata语句以Wikipedia句子为单位
Mapping Process for the Task: Wikidata Statements to Text as Wikipedia Sentences
论文作者
论文摘要
Wikipedia被公认为是人类社会中最成功的在线合作项目之一,近年来已经取得了迅速的增长,并希望不断地扩大内容并传播全球每个人的知识价值。志愿者的短缺带来了维基百科许多问题,包括目前为300多种语言开发内容。因此,机器可以自动生成内容以减少人类对Wikipedia语言项目的努力的好处可能是相当大的。在本文中,我们建议将Wikidata语句转换为句子级别的Wikipedia项目的自然语言文本(WS2T)的任务。主要的步骤是组织陈述,表示为四倍体和三元组,然后将其映射到英语Wikipedia中相应的句子中。我们在各个方面评估输出语料库:句子结构分析,噪声过滤以及基于单词嵌入模型的句子组件之间的关系。结果不仅有助于数据到文本生成任务,而且对该领域的其他相关作品也有帮助。
Acknowledged as one of the most successful online cooperative projects in human society, Wikipedia has obtained rapid growth in recent years and desires continuously to expand content and disseminate knowledge values for everyone globally. The shortage of volunteers brings to Wikipedia many issues, including developing content for over 300 languages at the present. Therefore, the benefit that machines can automatically generate content to reduce human efforts on Wikipedia language projects could be considerable. In this paper, we propose our mapping process for the task of converting Wikidata statements to natural language text (WS2T) for Wikipedia projects at the sentence level. The main step is to organize statements, represented as a group of quadruples and triples, and then to map them to corresponding sentences in English Wikipedia. We evaluate the output corpus in various aspects: sentence structure analysis, noise filtering, and relationships between sentence components based on word embedding models. The results are helpful not only for the data-to-text generation task but also for other relevant works in the field.