自动对文本生成的俄罗斯释义语料库进行自动排名

论文标题

自动对文本生成的俄罗斯释义语料库进行自动排名

Automatically Ranked Russian Paraphrase Corpus for Text Generation

论文作者

Gudkov, Vadim, Mitrofanova, Olga, Filippskikh, Elizaveta

论文摘要

该文章的重点是自动开发和对俄罗斯释义的大型语料库的排名，这被证明是俄罗斯计算语言学中这种类型的第一个语料库。现有的俄罗斯手动注释释义数据集仅限于小型释义语料库和paraplag，适用于一组NLP任务，例如释义和窃，句子和窃的检测，相似性和相关性估计等等。由于尺寸限制，这些数据集可以在End-Ent-End End End Edent Generation Selutions中很难应用。同时，释义需要大量的培训数据。在我们的研究中，我们提出了解决该问题的解决方案：我们收集，评估和评估新的公开标题释义语料库（Paraphraser Plus），然后使用通用变形金刚自动排名的Corpora进行手动评估进行文本生成实验。

The article is focused on automatic development and ranking of a large corpus for Russian paraphrase generation which proves to be the first corpus of such type in Russian computational linguistics. Existing manually annotated paraphrase datasets for Russian are limited to small-sized ParaPhraser corpus and ParaPlag which are suitable for a set of NLP tasks, such as paraphrase and plagiarism detection, sentence similarity and relatedness estimation, etc. Due to size restrictions, these datasets can hardly be applied in end-to-end text generation solutions. Meanwhile, paraphrase generation requires a large amount of training data. In our study we propose a solution to the problem: we collect, rank and evaluate a new publicly available headline paraphrase corpus (ParaPhraser Plus), and then perform text generation experiments with manual evaluation on automatically ranked corpora using the Universal Transformer architecture.

下载PDF全文

下载文献需遵守相关版权规定

论文标题