实时发言人诊断的时间顺序自我训练

论文标题

实时发言人诊断的时间顺序自我训练

Chronological Self-Training for Real-Time Speaker Diarization

论文作者

Padfield, Dirk, Liebling, Daniel J.

论文摘要

根据扬声器的声音，诊断将音频流划分为细分市场。包括入学步骤的实时诊断系统应限制入学培训样本，以减少用户互动时间。尽管对少数样品的培训产生的性能较差，但我们表明，使用年代自训练方法可以大大提高准确性。我们研究了训练时间和分类性能之间的权衡，发现1秒足以达到超过95％的精度。我们从6种不同的语言中评估了700个音频对话文件约10分钟，并证明平均诊断错误率低至10％。

Diarization partitions an audio stream into segments based on the voices of the speakers. Real-time diarization systems that include an enrollment step should limit enrollment training samples to reduce user interaction time. Although training on a small number of samples yields poor performance, we show that the accuracy can be improved dramatically using a chronological self-training approach. We studied the tradeoff between training time and classification performance and found that 1 second is sufficient to reach over 95% accuracy. We evaluated on 700 audio conversation files of about 10 minutes each from 6 different languages and demonstrated average diarization error rates as low as 10%.

下载PDF全文

下载文献需遵守相关版权规定

论文标题