使用变压器网络和自我监督的预训练的功能，端到端的口语理解

论文标题

使用变压器网络和自我监督的预训练的功能，端到端的口语理解

End-to-end spoken language understanding using transformer networks and self-supervised pre-trained features

论文作者

Morais, Edmilson, Kuo, Hong-Kwang J., Thomas, Samuel, Tuske, Zoltan, Kingsbury, Brian

论文摘要

变形金刚网络和自我监管的预训练一直在自然语言处理（NLP）领域持续提供最先进的结果；但是，他们在口语理解（SLU）领域的优点仍然需要进一步调查。在本文中，我们介绍了基于模块化的端到端（E2E）SLU Transformer网络架构，该体系结构允许使用自我监督的预训练的预训练的声学特征，预训练的模型初始化和多任务训练。使用ATIS数据集预测意图和实体标签/值的几个SLU实验。这些实验研究了预训练的模型初始化和多任务训练与传统滤纸或自我监管的预训练的声学特征的相互作用。结果不仅表明，在几乎所有实验中，自我监管的预训练的声学特征都超过了滤纸的特征，而且当这些功能与多任务训练结合使用时，它们几乎消除了预训练的模型初始化的必要性。

Transformer networks and self-supervised pre-training have consistently delivered state-of-art results in the field of natural language processing (NLP); however, their merits in the field of spoken language understanding (SLU) still need further investigation. In this paper we introduce a modular End-to-End (E2E) SLU transformer network based architecture which allows the use of self-supervised pre-trained acoustic features, pre-trained model initialization and multi-task training. Several SLU experiments for predicting intent and entity labels/values using the ATIS dataset are performed. These experiments investigate the interaction of pre-trained model initialization and multi-task training with either traditional filterbank or self-supervised pre-trained acoustic features. Results show not only that self-supervised pre-trained acoustic features outperform filterbank features in almost all the experiments, but also that when these features are used in combination with multi-task training, they almost eliminate the necessity of pre-trained model initialization.

下载PDF全文

下载文献需遵守相关版权规定

论文标题