数据集对自动语音识别的声学模型的影响

论文标题

数据集对自动语音识别的声学模型的影响

Impact of Dataset on Acoustic Models for Automatic Speech Recognition

论文作者

Singh, Siddhesh

论文摘要

在自动语音识别中，GMM-HMM已被广泛用于声学建模。随着深度学习的当前进步，声学模型的高斯混合模型（GMM）已被深神经网络取代，即DNN-HMM声学模型。 GMM模型被广泛用于创建混合深神经网络模型的训练数据的比对，从而使其成为创建准确比对的重要任务。许多因素，例如培训数据集大小，培训数据扩展，模型超参数等都会影响模型学习。传统上，在机器学习中，较大的数据集倾向于具有更好的性能，而较小的数据集倾向于触发过度拟合。语音数据的收集及其准确的转录是不同语言的重大挑战，在大多数情况下，它可能仅限于大型组织。此外，在可用的大数据集的情况下，使用此类数据培训模型需要额外的时间和计算资源，这可能是不可用的。虽然发布了有关开源数据集中最先进的ASR模型准确性的数据，但不容易获得有关数据集对声学模型的影响的研究。这项工作旨在研究数据集大小变化对各种GMM-HMM声学模型及其各自计算成本的性能的影响。

In Automatic Speech Recognition, GMM-HMM had been widely used for acoustic modelling. With the current advancement of deep learning, the Gaussian Mixture Model (GMM) from acoustic models has been replaced with Deep Neural Network, namely DNN-HMM Acoustic Models. The GMM models are widely used to create the alignments of the training data for the hybrid deep neural network model, thus making it an important task to create accurate alignments. Many factors such as training dataset size, training data augmentation, model hyperparameters, etc., affect the model learning. Traditionally in machine learning, larger datasets tend to have better performance, while smaller datasets tend to trigger over-fitting. The collection of speech data and their accurate transcriptions is a significant challenge that varies over different languages, and in most cases, it might be limited to big organizations. Moreover, in the case of available large datasets, training a model using such data requires additional time and computing resources, which may not be available. While the data about the accuracy of state-of-the-art ASR models on open-source datasets are published, the study about the impact of the size of a dataset on acoustic models is not readily available. This work aims to investigate the impact of dataset size variations on the performance of various GMM-HMM Acoustic Models and their respective computational costs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题