论文标题
检测和分类恶意对话响应:分类法,数据和方法论
Detecting and Classifying Malevolent Dialogue Responses: Taxonomy, Data and Methodology
论文作者
论文摘要
作为将人们与信息联系起来的一种方式,对话界面越来越受欢迎。与基于模板或基于检索的代理相比,基于语料库的对话界面能够产生更多样化和自然的响应。随着基于堆肥的对话剂的生成能力的提高,需要对在内容和对话行为方面不合适的恶意反应进行分类和过滤。以前关于识别和分类不适当内容的主题的研究主要集中在某些类别的恶意性或单一句子上,而不是整个对话。在本文中,我们定义了恶意对话响应检测和分类(MDRDC)的任务。我们为进步研究这项任务做出了三项贡献。首先,我们提出了分层恶意对话分类法(HMDT)。其次,我们创建了一个标记为多转向对话数据集,并将MDRDC任务作为层次分类任务制定为本分类学的层次分类任务。第三,我们将文本分类方法应用于MDRDC任务,并报告旨在评估这些方法性能的广泛实验。
Conversational interfaces are increasingly popular as a way of connecting people to information. Corpus-based conversational interfaces are able to generate more diverse and natural responses than template-based or retrieval-based agents. With their increased generative capacity of corpusbased conversational agents comes the need to classify and filter out malevolent responses that are inappropriate in terms of content and dialogue acts. Previous studies on the topic of recognizing and classifying inappropriate content are mostly focused on a certain category of malevolence or on single sentences instead of an entire dialogue. In this paper, we define the task of Malevolent Dialogue Response Detection and Classification (MDRDC). We make three contributions to advance research on this task. First, we present a Hierarchical Malevolent Dialogue Taxonomy (HMDT). Second, we create a labelled multi-turn dialogue dataset and formulate the MDRDC task as a hierarchical classification task over this taxonomy. Third, we apply stateof-the-art text classification methods to the MDRDC task and report on extensive experiments aimed at assessing the performance of these approaches.