论文标题
GUESC:与大语言模型进行对话的对话,以进行情感支持对话
AugESC: Dialogue Augmentation with Large Language Models for Emotional Support Conversation
论文作者
论文摘要
由于数据策展的昂贵成本,众包对话中心的规模和主题覆盖范围通常受到限制。这将阻碍下游对话模型的概括为开放域主题。在这项工作中,我们利用大型语言模型在情感支持对话(ESC)的任务中进行对话增强。通过将对话增强视为对话完成任务,我们促使一个微调的语言模型从各种主题的可用对话帖子中完成完整的对话,然后根据启发式方法进行后处理。采用这种方法,我们构建了ESC任务的增强数据集Augesc,该数据集在很大程度上扩展了众包Esconv语料库的规模和主题覆盖范围。通过全面的人类评估,我们证明了我们的方法优于对话增强的强大基线,而AUGESC与众包语料库具有可比的对话质量。我们还进行了人类的互动评估,并证明了对GUESC的训练后培训改善了下游对话模型的开放域主题的概括能力。这些结果表明了Augesc的实用性,并突出了大型语言模型在改善数据筛选对话生成任务方面的潜力。
Crowdsourced dialogue corpora are usually limited in scale and topic coverage due to the expensive cost of data curation. This would hinder the generalization of downstream dialogue models to open-domain topics. In this work, we leverage large language models for dialogue augmentation in the task of emotional support conversation (ESC). By treating dialogue augmentation as a dialogue completion task, we prompt a fine-tuned language model to complete full dialogues from available dialogue posts of various topics, which are then postprocessed based on heuristics. Applying this approach, we construct AugESC, an augmented dataset for the ESC task, which largely extends the scale and topic coverage of the crowdsourced ESConv corpus. Through comprehensive human evaluation, we demonstrate that our approach is superior to strong baselines of dialogue augmentation and that AugESC has comparable dialogue quality to the crowdsourced corpus. We also conduct human interactive evaluation and prove that post-training on AugESC improves downstream dialogue models' generalization ability to open-domain topics. These results suggest the utility of AugESC and highlight the potential of large language models in improving data-scarce dialogue generation tasks.