对话性语音分离：用于流应用的评估研究

论文标题

对话性语音分离：用于流应用的评估研究

Conversational Speech Separation: an Evaluation Study for Streaming Applications

论文作者

Morrone, Giovanni, Cornell, Samuele, Zovato, Enrico, Brutti, Alessio, Squartini, Stefano

论文摘要

连续的语音分离（CSS）是一个最近提出的框架，旨在以流媒体方式将每个说话者与输入混合物信号分开。此后，我们对CSS系统的实践设计注意事项进行了评估研究，以解决最近工作中忽略的重要方面。特别是，我们专注于分离性能，计算要求和输出潜伏期之间的权衡，以表明如何使用离线分离算法来执行具有所需延迟的CSS。我们对CSS处理窗口尺寸的选择和稀疏重叠数据的跳跃大小进行了广泛的分析。我们发现，对于5 s的窗口，可以获得计算负担和性能之间的最佳权衡。

Continuous speech separation (CSS) is a recently proposed framework which aims at separating each speaker from an input mixture signal in a streaming fashion. Hereafter we perform an evaluation study on practical design considerations for a CSS system, addressing important aspects which have been neglected in recent works. In particular, we focus on the trade-off between separation performance, computational requirements and output latency showing how an offline separation algorithm can be used to perform CSS with a desired latency. We carry out an extensive analysis on the choice of CSS processing window size and hop size on sparsely overlapped data. We find out that the best trade-off between computational burden and performance is obtained for a window of 5 s.

下载PDF全文

下载文献需遵守相关版权规定

论文标题