论文标题
端到端的口语理解没有完整的成绩单
End-to-End Spoken Language Understanding Without Full Transcripts
论文作者
论文摘要
口语理解(SLU)的一个重要组成部分是插槽填充:代表使用语义实体标签的口语话语的含义。在本文中,我们开发了直接将语音输入转换为语义实体的端到端(E2E)语言理解系统,并研究这些E2E SLU模型是否可以仅在没有单词词写字样的语义实体注释上培训。培训此类模型非常有用,因为它们可以大大降低数据收集成本。我们通过调整最初用于语音识别的训练的模型来创建两种类型的语音到现实模型,一个CTC模型和基于注意的编码模型。鉴于我们的实验涉及语音输入,因此这些系统需要识别实体标签和正确表示实体价值的单词。对于我们在ATIS语料库上进行的语音实验实验,CTC和注意力模型都表现出令人印象深刻的跳过非实现单词的能力:当对实体与完整成绩单进行训练时,几乎没有降解。我们还探索了实体处于秩序中的情况,不一定与话语中的口头订单有关。注意力模型具有重新排序的能力,表现出色,在语音到袋中的F1得分中仅达到了约2%的降解。
An essential component of spoken language understanding (SLU) is slot filling: representing the meaning of a spoken utterance using semantic entity labels. In this paper, we develop end-to-end (E2E) spoken language understanding systems that directly convert speech input to semantic entities and investigate if these E2E SLU models can be trained solely on semantic entity annotations without word-for-word transcripts. Training such models is very useful as they can drastically reduce the cost of data collection. We created two types of such speech-to-entities models, a CTC model and an attention-based encoder-decoder model, by adapting models trained originally for speech recognition. Given that our experiments involve speech input, these systems need to recognize both the entity label and words representing the entity value correctly. For our speech-to-entities experiments on the ATIS corpus, both the CTC and attention models showed impressive ability to skip non-entity words: there was little degradation when trained on just entities versus full transcripts. We also explored the scenario where the entities are in an order not necessarily related to spoken order in the utterance. With its ability to do re-ordering, the attention model did remarkably well, achieving only about 2% degradation in speech-to-bag-of-entities F1 score.