论文标题
声音删除的声音删除,可保留人声影响线索
Acoustically-Driven Phoneme Removal That Preserves Vocal Affect Cues
论文作者
论文摘要
在本文中,我们提出了一种从语音中删除语言信息的方法,目的是隔离情感的副语言指标。这种方法的直接实用性在于对声音情感敏感性的临床测试,而这种情感的敏感性并非被语言混淆,这在各种临床人群中受到了损害。该方法基于同时记录语音音频和电视学信号。语音音频信号用于估计平均声道滤波器响应和振幅包络。鸡蛋信号提供的语音源活动直接相关,主要与语音表达无关。语音音频的动态能量和平均声道滤波器被应用于鸡蛋信号,创建了第三个信号,旨在从声音生产系统中捕获尽可能多的副语言信息 - 最大程度地提高了生物声音提示的保留,同时消除了语音提示以表达口头含义。为了评估这种方法的成功,我们研究了对相应语音音频的感知,并在与在线听众进行的情感评级实验中转换了鸡蛋信号。结果表明,在匹配信号的感知影响方面具有很高的相似性,表明我们的方法有效。
In this paper, we propose a method for removing linguistic information from speech for the purpose of isolating paralinguistic indicators of affect. The immediate utility of this method lies in clinical tests of sensitivity to vocal affect that are not confounded by language, which is impaired in a variety of clinical populations. The method is based on simultaneous recordings of speech audio and electroglottographic (EGG) signals. The speech audio signal is used to estimate the average vocal tract filter response and amplitude envelop. The EGG signal supplies a direct correlate of voice source activity that is mostly independent of phonetic articulation. The dynamic energy of the speech audio and the average vocal tract filter are applied to the EGG signal create a third signal designed to capture as much paralinguistic information from the vocal production system as possible -- maximizing the retention of bioacoustic cues to affect -- while eliminating phonetic cues to verbal meaning. To evaluate the success of this method, we studied the perception of corresponding speech audio and transformed EGG signals in an affect rating experiment with online listeners. The results show a high degree of similarity in the perceived affect of matched signals, indicating that our method is effective.