DDKTOR：自动迪士图动员语音分析

论文标题

DDKTOR：自动迪士图动员语音分析

DDKtor: Automatic Diadochokinetic Speech Analysis

论文作者

Segal, Yael, Hitczenko, Kasia, Goldrick, Matthew, Buchwald, Adam, Roberts, Angela, Keshet, Joseph

论文摘要

参与者反复产生音节的Diadochokinetic语音任务（DDK）通常用作评估语音运动障碍的一部分。这些研究依赖于时间密集型，主观的手动分析，并且仅提供粗粒的语音图片。本文介绍了两个深度神经网络模型，这些模型会自动从未注释，未转录的语音中分割辅音和元音。两种模型都在原始波形上工作，并使用卷积层进行特征提取。第一个模型基于LSTM分类器，然后是完全连接的层，而第二个模型则添加了更多的卷积层，然后是完全连接的层。这些模型预测的这些分割用于获得语音速率和声音持续时间的度量。年轻的健康个体数据集的结果表明，我们的LSTM模型的表现优于当前的最新系统，并且与受过训练的人类注释相当。此外，在对帕金森氏病数据集的看不见的老年人进行评估时，LSTM模型还与受过训练的人类注释者相当。

Diadochokinetic speech tasks (DDK), in which participants repeatedly produce syllables, are commonly used as part of the assessment of speech motor impairments. These studies rely on manual analyses that are time-intensive, subjective, and provide only a coarse-grained picture of speech. This paper presents two deep neural network models that automatically segment consonants and vowels from unannotated, untranscribed speech. Both models work on the raw waveform and use convolutional layers for feature extraction. The first model is based on an LSTM classifier followed by fully connected layers, while the second model adds more convolutional layers followed by fully connected layers. These segmentations predicted by the models are used to obtain measures of speech rate and sound duration. Results on a young healthy individuals dataset show that our LSTM model outperforms the current state-of-the-art systems and performs comparably to trained human annotators. Moreover, the LSTM model also presents comparable results to trained human annotators when evaluated on unseen older individuals with Parkinson's Disease dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题