论文标题
改善模拟对话的自然性进行端到端神经腹泻
Improving the Naturalness of Simulated Conversations for End-to-End Neural Diarization
论文作者
论文摘要
本文研究了一种在端到端神经腹泻(EEND)模型训练中模拟自然对话的方法。由于缺乏任何注释的真实对话数据集,因此EEND通常首先在大规模的模拟对话数据集上预估计,然后适应目标真实数据集。模拟数据集在训练中起着至关重要的作用,但尚未对最佳模拟方法进行足够的研究。因此,我们提出了一种模拟自然对话演讲的方法。与传统的方法相反,只需结合多个说话者的语音,我们的方法就考虑到了转折。我们定义了四种类型的扬声器过渡,并顺序排列它们以模拟自然对话。就静音和重叠比率而言,发现使用我们方法模拟的数据集在统计学上与真实数据集相似。使用Callhome和CSJ数据集对两扬声器诊断的实验结果表明,模拟数据集有助于提高REEND的性能。
This paper investigates a method for simulating natural conversation in the model training of end-to-end neural diarization (EEND). Due to the lack of any annotated real conversational dataset, EEND is usually pretrained on a large-scale simulated conversational dataset first and then adapted to the target real dataset. Simulated datasets play an essential role in the training of EEND, but as yet there has been insufficient investigation into an optimal simulation method. We thus propose a method to simulate natural conversational speech. In contrast to conventional methods, which simply combine the speech of multiple speakers, our method takes turn-taking into account. We define four types of speaker transition and sequentially arrange them to simulate natural conversations. The dataset simulated using our method was found to be statistically similar to the real dataset in terms of the silence and overlap ratios. The experimental results on two-speaker diarization using the CALLHOME and CSJ datasets showed that the simulated dataset contributes to improving the performance of EEND.