论文标题
对开放域对话的预训练的变压器语言模型的实证研究
An Empirical Investigation of Pre-Trained Transformer Language Models for Open-Domain Dialogue Generation
论文作者
论文摘要
我们介绍了基于预训练的变压器的自动回归语言模型,以实现开放域对话的任务。采用预训练和微调的训练范式进行参数学习。新闻和维基百科的中文和英语分别用于培训前阶段。对话上下文和响应在微调阶段被串联为单个序列作为模型的输入。上下文和响应的加权关节预测范式旨在评估有或没有损失项的模型的性能。采用各种解码策略,例如贪婪搜索,梁搜索,Top-K采样等,以进行响应文本生成。进行了广泛的实验,对典型的单转弯和多转化对话语料库进行了大规模的实验,例如微博,douban,reddit,dailydialog和persona-chat。报告了有关语言模型以及基线方法的相关性和多样性的详细自动评估指标。
We present an empirical investigation of pre-trained Transformer-based auto-regressive language models for the task of open-domain dialogue generation. Training paradigm of pre-training and fine-tuning is employed to conduct the parameter learning. Corpora of News and Wikipedia in Chinese and English are collected for the pre-training stage respectively. Dialogue context and response are concatenated into a single sequence utilized as the input of the models during the fine-tuning stage. A weighted joint prediction paradigm for both context and response is designed to evaluate the performance of models with or without the loss term for context prediction. Various of decoding strategies such as greedy search, beam search, top-k sampling, etc. are employed to conduct the response text generation. Extensive experiments are conducted on the typical single-turn and multi-turn dialogue corpora such as Weibo, Douban, Reddit, DailyDialog, and Persona-Chat. Detailed numbers of automatic evaluation metrics on relevance and diversity of the generated results for the languages models as well as the baseline approaches are reported.