论文标题
检测AI合成的印地语语音
Detection of AI Synthesized Hindi Speech
论文作者
论文摘要
生成人工语音模型的最新进展使得产生了高度逼真的语音信号。起初,获得这些人为合成的信号,例如语音克隆或深色假货似乎令人兴奋,但是如果不受组织检查,可能会导致我们进入数字反乌托邦。音频取证的主要重点之一是验证语音的真实性。尽管提出了一些用于英语演讲的解决方案,但综合印地语演讲的发现并没有引起太多关注。在这里,我们提出了一种歧视AI合成印地语演讲的方法。我们已经利用了双晶阶段,两种幅度,MEL频率sepstral系数(MFCC),Delta cepstral和Delta Square Cepstral,作为机器学习模型的歧视功能。此外,我们将研究扩展到使用深层神经网络进行广泛的实验,特别是VGG16和自制CNN作为体系结构模型。通过自制CNN模型,我们获得了99.83%的精度,而VGG16和99.99%的精度为99.99%。
The recent advancements in generative artificial speech models have made possible the generation of highly realistic speech signals. At first, it seems exciting to obtain these artificially synthesized signals such as speech clones or deep fakes but if left unchecked, it may lead us to digital dystopia. One of the primary focus in audio forensics is validating the authenticity of a speech. Though some solutions are proposed for English speeches but the detection of synthetic Hindi speeches have not gained much attention. Here, we propose an approach for discrimination of AI synthesized Hindi speech from an actual human speech. We have exploited the Bicoherence Phase, Bicoherence Magnitude, Mel Frequency Cepstral Coefficient (MFCC), Delta Cepstral, and Delta Square Cepstral as the discriminating features for machine learning models. Also, we extend the study to using deep neural networks for extensive experiments, specifically VGG16 and homemade CNN as the architecture models. We obtained an accuracy of 99.83% with VGG16 and 99.99% with homemade CNN models.