深入的强化学习和预训练以进行自动语音识别的时间效率培训

论文标题

深入的强化学习和预训练以进行自动语音识别的时间效率培训

Deep Reinforcement Learning with Pre-training for Time-efficient Training of Automatic Speech Recognition

论文作者

Rajapakshe, Thejan, Latif, Siddique, Rana, Rajib, Khalifa, Sara, Schuller, Björn W.

论文摘要

深度强化学习（DEEP RL）是深度学习与强化学习原理的结合，以创建可以通过与环境互动来学习的有效方法。这导致了许多复杂任务的突破，例如以前很难解决的游戏“ Go”。但是，DEEP RL需要大量的训练时间，因此很难在各种现实生活中（例如人类计算机互动（HCI））使用。在本文中，我们研究了深入RL的预训练，以减少训练时间并提高语音识别的表现，这是HCI的流行应用。为了评估培训中的性能改进，我们使用了公开可用的“语音命令”数据集，其中包含2,618位扬声器所说的30个命令关键字的话语。结果表明，与未经训练的RL相比，深入RL的预训练可提供更快的收敛速度，同时提高了语音识别精度。

Deep reinforcement learning (deep RL) is a combination of deep learning with reinforcement learning principles to create efficient methods that can learn by interacting with its environment. This has led to breakthroughs in many complex tasks, such as playing the game "Go", that were previously difficult to solve. However, deep RL requires significant training time making it difficult to use in various real-life applications such as Human-Computer Interaction (HCI). In this paper, we study pre-training in deep RL to reduce the training time and improve the performance of Speech Recognition, a popular application of HCI. To evaluate the performance improvement in training we use the publicly available "Speech Command" dataset, which contains utterances of 30 command keywords spoken by 2,618 speakers. Results show that pre-training with deep RL offers faster convergence compared to non-pre-trained RL while achieving improved speech recognition accuracy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题