一千个单词值得不止一个录音：基于NLP的扬声器更改点检测

论文标题

一千个单词值得不止一个录音：基于NLP的扬声器更改点检测

A Thousand Words are Worth More Than One Recording: NLP Based Speaker Change Point Detection

论文作者

Anidjar, O. H., Hajaj, C., Dvir, A., Gilad, I.

论文摘要

扬声器诊断（SD）包括根据说话者身份分裂或分割输入音频爆发。在本文中，我们着重于SD问题的关键任务，即音频分割过程，并提出解决方案检测（CPD）问题的解决方案。我们从经验上证明了说话者数量的增加与召回和F1得分测量之间的负相关性。这种负相关性显示为大规模实验评估过程的结果，该过程占据了其优越性与最近开发的基于语音的解决方案的优势。为了克服说话者的数量，我们建议基于新颖的自然语言处理（NLP）技术的强大解决方案，以及元数据具有提取过程，而不是基于人声。据我们所知，我们是第一个提出基于智能NLP的解决方案的人，该解决方案（i）通过希伯来语中的数据集解决了CPD问题，并且（ii）解决了SD问题的CPD变体。根据两个不同的数据集，我们从经验上表明，我们的方法被烧毁，以准确识别音频爆发中的CPD，在召回和F1得分测量中，成功的82.12％和89.02％。

Speaker Diarization (SD) consists of splitting or segmenting an input audio burst according to speaker identities. In this paper, we focus on the crucial task of the SD problem which is the audio segmenting process and suggest a solution for the Change Point Detection (CPD) problem. We empirically demonstrate the negative correlation between an increase in the number of speakers and the Recall and F1-Score measurements. This negative correlation is shown to be the outcome of a massive experimental evaluation process, which accounts its superiority to recently developed voice based solutions. In order to overcome the number of speakers issue, we suggest a robust solution based on a novel Natural Language Processing (NLP) technique, as well as a metadata features extraction process, rather than a vocal based alone. To the best of our knowledge, we are the first to propose an intelligent NLP based solution that (I) tackles the CPD problem with a dataset in Hebrew, and (II) solves the CPD variant of the SD problem. We empirically show, based on two distinct datasets, that our method is abled to accurately identify the CPDs in an audio burst with 82.12% and 89.02% of success in the Recall and F1-score measurements.

下载PDF全文

下载文献需遵守相关版权规定

论文标题