在数据库上有效地部署自然语言界面

论文标题

在数据库上有效地部署自然语言界面

Efficient Deployment of Conversational Natural Language Interfaces over Databases

论文作者

Colas, Anthony, Bui, Trung, Dernoncourt, Franck, Sinha, Moumita, Kim, Doo Soon

论文摘要

许多用户与聊天机器人和AI助手进行交流，以帮助他们完成各种任务。助手的一个关键组成部分是能够理解和回答用户的自然语言问题，以解决问题（QA）。因为数据通常可以以结构化的方式存储，所以重要的步骤涉及将自然语言问题转变为相应的查询语言。但是，为了培训大多数自然语言与语言的最先进模型，首先需要大量培训数据。在大多数域中，此数据不可用，为各个域收集此类数据集可能是乏味且耗时的。在这项工作中，我们提出了一种新型方法，用于加速培训数据集收集，以开发自然语言与语言的机器学习模型。我们的系统允许一个人生成对话性多项数据，其中多个回合定义了对话会话，从而可以更好地利用聊天机器人接口。我们在基于SQL和SPARQL的数据集上训练两个当前最新的NL到QL模型，以展示我们创建的数据的适应性和功效。

Many users communicate with chatbots and AI assistants in order to help them with various tasks. A key component of the assistant is the ability to understand and answer a user's natural language questions for question-answering (QA). Because data can be usually stored in a structured manner, an essential step involves turning a natural language question into its corresponding query language. However, in order to train most natural language-to-query-language state-of-the-art models, a large amount of training data is needed first. In most domains, this data is not available and collecting such datasets for various domains can be tedious and time-consuming. In this work, we propose a novel method for accelerating the training dataset collection for developing the natural language-to-query-language machine learning models. Our system allows one to generate conversational multi-term data, where multiple turns define a dialogue session, enabling one to better utilize chatbot interfaces. We train two current state-of-the-art NL-to-QL models, on both an SQL and SPARQL-based datasets in order to showcase the adaptability and efficacy of our created data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题