论文标题

实时发言人诊断的时间顺序自我训练

Chronological Self-Training for Real-Time Speaker Diarization

论文作者

Padfield, Dirk, Liebling, Daniel J.

论文摘要

根据扬声器的声音,诊断将音频流划分为细分市场。包括入学步骤的实时诊断系统应限制入学培训样本,以减少用户互动时间。尽管对少数样品的培训产生的性能较差,但我们表明,使用年代自训练方法可以大大提高准确性。我们研究了训练时间和分类性能之间的权衡,发现1秒足以达到超过95%的精度。我们从6种不同的语言中评估了700个音频对话文件约10分钟,并证明平均诊断错误率低至10%。

Diarization partitions an audio stream into segments based on the voices of the speakers. Real-time diarization systems that include an enrollment step should limit enrollment training samples to reduce user interaction time. Although training on a small number of samples yields poor performance, we show that the accuracy can be improved dramatically using a chronological self-training approach. We studied the tradeoff between training time and classification performance and found that 1 second is sufficient to reach over 95% accuracy. We evaluated on 700 audio conversation files of about 10 minutes each from 6 different languages and demonstrated average diarization error rates as low as 10%.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源