多路径RNN用于长顺序数据的层次建模及其在扬声器流分离的应用

论文标题

多路径RNN用于长顺序数据的层次建模及其在扬声器流分离的应用

Multi-path RNN for hierarchical modeling of long sequential data and its application to speaker stream separation

论文作者

Kinoshita, Keisuke, von Neumann, Thilo, Delcroix, Marc, Nakatani, Tomohiro, Haeb-Umbach, Reinhold

论文摘要

最近，基于双路复发神经网络（DPRNN），时间域音频源分离大大提高了源分离性能。 DPRNN是长顺序数据的简单但有效的模型。虽然DPRNN非常有效地建模了话语长度的顺序数据，即约5至10秒的数据，但很难将其应用于更长的序列，例如由多种话语组成的整个对话。这仅仅是因为在这种情况下，其内部模块所消耗的时间步骤称为型Inter-chunk RNN变得非常大。为了减轻此问题，本文提出了一个多路径RNN（MPRNN），这是DPRNN的广义版本，该版本以层次结构方式对输入数据进行建模。在MPRNN框架中，输入数据在几个（> 3）的时间分辨率中表示，每个时间分辨率都由特定的RNN子模块建模。例如，处理最好的分辨率的RNN子模块只能在音素内对时间关系进行建模，而RNN子模块处理最粗糙的分辨率最粗糙的分辨率只能捕获说话者信息等话语之间的关系。我们使用类似对话的混合物进行实验，并表明MPRNN具有更大的模型容量，并且优于当前最新的DPRNN框架，尤其是在在线处理方案中。

Recently, the source separation performance was greatly improved by time-domain audio source separation based on dual-path recurrent neural network (DPRNN). DPRNN is a simple but effective model for a long sequential data. While DPRNN is quite efficient in modeling a sequential data of the length of an utterance, i.e., about 5 to 10 second data, it is harder to apply it to longer sequences such as whole conversations consisting of multiple utterances. It is simply because, in such a case, the number of time steps consumed by its internal module called inter-chunk RNN becomes extremely large. To mitigate this problem, this paper proposes a multi-path RNN (MPRNN), a generalized version of DPRNN, that models the input data in a hierarchical manner. In the MPRNN framework, the input data is represented at several (>3) time-resolutions, each of which is modeled by a specific RNN sub-module. For example, the RNN sub-module that deals with the finest resolution may model temporal relationship only within a phoneme, while the RNN sub-module handling the most coarse resolution may capture only the relationship between utterances such as speaker information. We perform experiments using simulated dialogue-like mixtures and show that MPRNN has greater model capacity, and it outperforms the current state-of-the-art DPRNN framework especially in online processing scenarios.

下载PDF全文

下载文献需遵守相关版权规定

论文标题