基于变压器的谱系转换

论文标题

基于变压器的谱系转换

Transformer based Grapheme-to-Phoneme Conversion

论文作者

Yolchuyeva, Sevinj, Németh, Géza, Gyires-Tóth, Bálint

论文摘要

注意机制是基于深度学习的自然语言处理（NLP）中最成功的技术之一。变压器网络体系结构完全基于注意机制，并且在没有复发和卷积层的神经机器翻译中优于序列到序列模型。字素至phoneme（g2p）转换是将字母（字母序列）转换为其发音（音素序列）的任务。它在文本到语音（TTS）和自动语音识别（ASR）系统中起着重要作用。在本文中，我们研究了变压器体系结构在G2P转换中的应用，并将其性能与基于经常性和卷积神经网络的方法进行比较。在美国英语和NetTalk数据集的CMUDICT数据集上评估了音素和单词错误率。结果表明，基于变压器的G2P在单词错误率方面优于基于卷积的方法，而我们的结果显着超过了两个数据集上有关单词和音素错误率的先前的经常性方法（无需注意）。此外，所提出的模型的大小远小于以前方法的大小。

Attention mechanism is one of the most successful techniques in deep learning based Natural Language Processing (NLP). The transformer network architecture is completely based on attention mechanisms, and it outperforms sequence-to-sequence models in neural machine translation without recurrent and convolutional layers. Grapheme-to-phoneme (G2P) conversion is a task of converting letters (grapheme sequence) to their pronunciations (phoneme sequence). It plays a significant role in text-to-speech (TTS) and automatic speech recognition (ASR) systems. In this paper, we investigate the application of transformer architecture to G2P conversion and compare its performance with recurrent and convolutional neural network based approaches. Phoneme and word error rates are evaluated on the CMUDict dataset for US English and the NetTalk dataset. The results show that transformer based G2P outperforms the convolutional-based approach in terms of word error rate and our results significantly exceeded previous recurrent approaches (without attention) regarding word and phoneme error rates on both datasets. Furthermore, the size of the proposed model is much smaller than the size of the previous approaches.

下载PDF全文

下载文献需遵守相关版权规定

论文标题