关于ESPNET工具包的最新发展由构象异构体提升

论文标题

关于ESPNET工具包的最新发展由构象异构体提升

Recent Developments on ESPnet Toolkit Boosted by Conformer

论文作者

Guo, Pengcheng, Boyer, Florian, Chang, Xuankai, Hayashi, Tomoki, Higuchi, Yosuke, Inaguma, Hirofumi, Kamo, Naoyuki, Li, Chenda, Garcia-Romero, Daniel, Shi, Jiatong, Shi, Jing, Watanabe, Shinji, Wei, Kun, Zhang, Wangyou, Zhang, Yuekai

论文摘要

在这项研究中，我们介绍了有关ESPNET：端到端语音处理工具包的最新发展，该工具包主要涉及最近提出的称为Conformer，卷积增强变压器的体系结构。本文显示了各种端到端语音处理应用程序的结果，例如自动语音识别（ASR），语音翻译（ST），语音分离（SS）和文本到语音（TTS）。我们的实验揭示了通过对不同任务的构象体获得的各种培训技巧和显着的绩效好处。这些结果具有竞争力，甚至超过了当前的最新变压器模型。我们正准备使用开源和公开使用的Corpora发布所有具有预训练模型的任务的多合一食谱。我们对这项工作的目的是通过减少准备最先进的研究环境的负担来为我们的研究社区做出贡献。

In this study, we present recent developments on ESPnet: End-to-End Speech Processing toolkit, which mainly involves a recently proposed architecture called Conformer, Convolution-augmented Transformer. This paper shows the results for a wide range of end-to-end speech processing applications, such as automatic speech recognition (ASR), speech translations (ST), speech separation (SS) and text-to-speech (TTS). Our experiments reveal various training tips and significant performance benefits obtained with the Conformer on different tasks. These results are competitive or even outperform the current state-of-art Transformer models. We are preparing to release all-in-one recipes using open source and publicly available corpora for all the above tasks with pre-trained models. Our aim for this work is to contribute to our research community by reducing the burden of preparing state-of-the-art research environments usually requiring high resources.

下载PDF全文

下载文献需遵守相关版权规定

论文标题