PodcastMix：用于在播客中分开音乐和语音的数据集

论文标题

PodcastMix：用于在播客中分开音乐和语音的数据集

PodcastMix: A dataset for separating music and speech in podcasts

论文作者

Schmidt, Nicolás, Pons, Jordi, Miron, Marius

论文摘要

我们介绍了播客，这是一个正式的数据集，该数据集正式的任务是将背景音乐和播客中的前景演讲分开。我们旨在定义适合培训和评估（深度学习）源分离模型的基准。为此，我们根据程序生成的播客发布了一个大型多样的培训数据集。但是，当前（深度学习）模型可能会陷入概括问题，特别是在接受合成数据培训时。为了解决潜在的概括问题，我们根据真正的播客发布评估集，我们设计了目标和主观测试。在我们对真实播客的实验中，我们发现当前（深度学习）模型可能存在泛化问题。然而，这些可以表现能力，例如，我们最好的基线将语音分开，平均意见分数为3.84（评级为“总体分离质量”为1到5）。数据集和基准可在线访问。

We introduce PodcastMix, a dataset formalizing the task of separating background music and foreground speech in podcasts. We aim at defining a benchmark suitable for training and evaluating (deep learning) source separation models. To that end, we release a large and diverse training dataset based on programatically generated podcasts. However, current (deep learning) models can incur into generalization issues, specially when trained on synthetic data. To target potential generalization issues, we release an evaluation set based on real podcasts for which we design objective and subjective tests. Out of our experiments with real podcasts, we find that current (deep learning) models may have generalization issues. Yet, these can perform competently, e.g., our best baseline separates speech with a mean opinion score of 3.84 (rating "overall separation quality" from 1 to 5). The dataset and baselines are accessible online.

下载PDF全文

下载文献需遵守相关版权规定

论文标题