语言模型是针对任务的对话系统的少数学习者

论文标题

语言模型是针对任务的对话系统的少数学习者

Language Models as Few-Shot Learner for Task-Oriented Dialogue Systems

论文作者

Madotto, Andrea, Liu, Zihan, Lin, Zhaojiang, Fung, Pascale

论文摘要

面向任务的对话系统使用四个连接的模块，即自然语言理解（NLU），对话状态跟踪（DST），对话策略（DP）和自然语言生成（NLG）。研究挑战是在与数据收集相关的高成本中学习每个模块最少的样本（即几乎没有射击）。解决此问题的最常见和有效的技术是转移学习，其中大型语言模型（在文本或特定于任务的数据上都进行了训练，都可以在几个样本上进行微调。这些方法需要进行微调步骤和每个任务的一组参数。与众不同的是，语言模型，例如GPT-2（Radford等，2019）和GPT-3（Brown等，2020），通过以很少的示例启动该模型，很少会学习。在本文中，我们评估了NLU，DST，DP和NLG任务中语言模型的启动少量能力。重要的是，我们强调了这种方法的当前局限性，并讨论了对未来工作的可能影响。

Task-oriented dialogue systems use four connected modules, namely, Natural Language Understanding (NLU), a Dialogue State Tracking (DST), Dialogue Policy (DP) and Natural Language Generation (NLG). A research challenge is to learn each module with the least amount of samples (i.e., few-shots) given the high cost related to the data collection. The most common and effective technique to solve this problem is transfer learning, where large language models, either pre-trained on text or task-specific data, are fine-tuned on the few samples. These methods require fine-tuning steps and a set of parameters for each task. Differently, language models, such as GPT-2 (Radford et al., 2019) and GPT-3 (Brown et al., 2020), allow few-shot learning by priming the model with few examples. In this paper, we evaluate the priming few-shot ability of language models in the NLU, DST, DP and NLG tasks. Importantly, we highlight the current limitations of this approach, and we discuss the possible implication for future work.

下载PDF全文

下载文献需遵守相关版权规定

论文标题