数据驱动的检测和分析吱吱作响的声音模式

论文标题

数据驱动的检测和分析吱吱作响的声音模式

Data-driven Detection and Analysis of the Patterns of Creaky Voice

论文作者

Drugman, Thomas, Kane, John, Gobl, Christer

论文摘要

本文调查了吱吱作响的声音的时间激发模式。吱吱作响的声音是一种经常用作短语界标记的语音质量，也是描绘态度，情感状态甚至社会地位的一种手段。因此，吱吱作响的声音的自动检测和建模可能对语音技术应用有影响。然而，吱吱作响的声音的声学特征与模态发声虽然不同。此外，几种声学模式可以带来吱吱作响的声音的感知，从而使用于自动检测，分析和建模的策略变得复杂。本研究是使用各种语言，扬声器以及阅读和对话数据进行的，涉及对文献中提出的各种声学特征的共同评估，以检测吱吱作响的声音。然后在分类实验中利用这些特征，与最新技术相比，我们在检测准确性方面取得了可观的提高。这两个实验显然突出了几种吱吱作响的模式的存在。提供了随后对已确定模式的定性和定量分析，该分析揭示了这些吱吱作响模式的使用情况相当大的依赖说话者的变异性。我们还研究了吱吱作响的语音检测系统如何在吱吱作响的模式上执行。

This paper investigates the temporal excitation patterns of creaky voice. Creaky voice is a voice quality frequently used as a phrase-boundary marker, but also as a means of portraying attitude, affective states and even social status. Consequently, the automatic detection and modelling of creaky voice may have implications for speech technology applications. The acoustic characteristics of creaky voice are, however, rather distinct from modal phonation. Further, several acoustic patterns can bring about the perception of creaky voice, thereby complicating the strategies used for its automatic detection, analysis and modelling. The present study is carried out using a variety of languages, speakers, and on both read and conversational data and involves a mutual information-based assessment of the various acoustic features proposed in the literature for detecting creaky voice. These features are then exploited in classification experiments where we achieve an appreciable improvement in detection accuracy compared to the state of the art. Both experiments clearly highlight the presence of several creaky patterns. A subsequent qualitative and quantitative analysis of the identified patterns is provided, which reveals a considerable speaker-dependent variability in the usage of these creaky patterns. We also investigate how creaky voice detection systems perform across creaky patterns.

下载PDF全文

下载文献需遵守相关版权规定

论文标题