concet：开放域对话代理的实体感知主题分类

论文标题

concet：开放域对话代理的实体感知主题分类

ConCET: Entity-Aware Topic Classification for Open-Domain Conversational Agents

论文作者

Ahmadvand, Ali, Sahijwani, Harshita, Choi, Jason Ingyu, Agichtein, Eugene

论文摘要

在开放域对话系统中识别每个用户的话语的主题（域）是所有随后的语言理解和响应任务的关键步骤。特别是，对于复杂的域，通常将话语路由到负责该域的单个组件。因此，正确将用户话语映射到正确的域是至关重要的。为了解决此问题，我们介绍了Concet：并发实体感知的对话主题分类器，该分类器将实体类型信息与话语内容功能结合在一起。具体而言，concet利用实体信息丰富了话语表示形式，将字符，单词和实体式嵌入嵌入到单个表示中。但是，对于拥有数百万个可用实体的丰富领域，将需要数量不切实际的标记培训数据。为了补充我们的模型，我们提出了一种简单有效的方法来生成合成训练数据，以使用常用的知识库来增强通常有限的标记培训数据，以生成其他标记的话语。我们首先对公开可用的人类对话数据集进行了广泛评估Concet和提议的培训方法，以校准我们针对以前的最新方法的方法；其次，我们在与真实用户的人机对话的大型数据集上评估了Concet，这是亚历克萨·阿列克萨奖的一部分。我们的结果表明，concet可以显着提高两个数据集的主题分类性能，包括对最先进的深度学习方法的8-10％改善。我们通过对系统性能的详细分析来补充定量结果，这些结果可用于进一步改进会话剂。

Identifying the topic (domain) of each user's utterance in open-domain conversational systems is a crucial step for all subsequent language understanding and response tasks. In particular, for complex domains, an utterance is often routed to a single component responsible for that domain. Thus, correctly mapping a user utterance to the right domain is critical. To address this problem, we introduce ConCET: a Concurrent Entity-aware conversational Topic classifier, which incorporates entity-type information together with the utterance content features. Specifically, ConCET utilizes entity information to enrich the utterance representation, combining character, word, and entity-type embeddings into a single representation. However, for rich domains with millions of available entities, unrealistic amounts of labeled training data would be required. To complement our model, we propose a simple and effective method for generating synthetic training data, to augment the typically limited amounts of labeled training data, using commonly available knowledge bases to generate additional labeled utterances. We extensively evaluate ConCET and our proposed training method first on an openly available human-human conversational dataset called Self-Dialogue, to calibrate our approach against previous state-of-the-art methods; second, we evaluate ConCET on a large dataset of human-machine conversations with real users, collected as part of the Amazon Alexa Prize. Our results show that ConCET significantly improves topic classification performance on both datasets, including 8-10% improvements over state-of-the-art deep learning methods. We complement our quantitative results with detailed analysis of system performance, which could be used for further improvements of conversational agents.

下载PDF全文

下载文献需遵守相关版权规定

论文标题