论文标题

人类和机器中的扬声器歧视:说话风格变异性的影响

Speaker discrimination in humans and machines: Effects of speaking style variability

论文作者

Afshan, Amber, Kreiman, Jody, Alwan, Abeer

论文摘要

说话风格的变化会影响人类区分个人与声音的能力吗?人类如何与旨在区分声音的自动系统进行比较?在本文中,我们试图通过比较人和机器说话者的歧视性能来回答这些问题,以进行阅读语音与随意对话。要求三十位听众执行相同的扬声器任务。将它们的性能与基于最新的X-Vector/PLDA自动扬声器验证系统进行了比较。结果表明,人类和机器在风格匹配的刺激中的表现都更好,当听众是美国英语的母语时,人类的表现会更好。在样式匹配的条件下,本地听众的表现优于机器(在读取语音的6.96%和14.35%的EER中,对于对话,15.12%对19.87%),但对于风格不匹配的条件,本机听众和机器之间没有显着差异。在所有条件下,与单独的每个人相比,将人类反应与机器结果融合都显示出改善,这表明人和机器对说话者歧视任务采用了不同的方法。通过检查单个说话者的结果,进一步证实了这些方法的差异,这表明人类听众和机器之间对不同和困惑的说话者的感知有所不同。

Does speaking style variation affect humans' ability to distinguish individuals from their voices? How do humans compare with automatic systems designed to discriminate between voices? In this paper, we attempt to answer these questions by comparing human and machine speaker discrimination performance for read speech versus casual conversations. Thirty listeners were asked to perform a same versus different speaker task. Their performance was compared to a state-of-the-art x-vector/PLDA-based automatic speaker verification system. Results showed that both humans and machines performed better with style-matched stimuli, and human performance was better when listeners were native speakers of American English. Native listeners performed better than machines in the style-matched conditions (EERs of 6.96% versus 14.35% for read speech, and 15.12% versus 19.87%, for conversations), but for style-mismatched conditions, there was no significant difference between native listeners and machines. In all conditions, fusing human responses with machine results showed improvements compared to each alone, suggesting that humans and machines have different approaches to speaker discrimination tasks. Differences in the approaches were further confirmed by examining results for individual speakers which showed that the perception of distinct and confused speakers differed between human listeners and machines.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源