论文标题

音乐射击自动歌词复合音乐的转录

Music-robust Automatic Lyrics Transcription of Polyphonic Music

论文作者

Gao, Xiaoxue, Gupta, Chitralekha, Li, Haizhou

论文摘要

复音音乐的歌词转录很具有挑战性,因为歌唱人声被背景音乐损坏。为了提高歌词转录到背景音乐的鲁棒性,我们提出了一种结合强调歌唱人声的功能的策略,即代表演唱人声提取的功能的音乐所启用功能,以及捕获歌唱人声以及背景音乐的功能,即音乐呈现的功能。我们表明,这两组功能相互补充,并且它们的组合性能比单独使用时更好,从而将声学模型的鲁棒性提高到了背景音乐。此外,通用语言模型和特定于特定的语言模型之间的语言模型插值可以进一步改善转录结果。我们的实验表明,我们提出的策略的表现优于复音音乐的现有歌词转录系统。此外,我们发现我们提出的音乐创作功能特别改善了歌曲《金属歌曲》的歌词转录性能,背景音乐响亮而占主导地位。

Lyrics transcription of polyphonic music is challenging because singing vocals are corrupted by the background music. To improve the robustness of lyrics transcription to the background music, we propose a strategy of combining the features that emphasize the singing vocals, i.e. music-removed features that represent singing vocal extracted features, and the features that capture the singing vocals as well as the background music, i.e. music-present features. We show that these two sets of features complement each other, and their combination performs better than when they are used alone, thus improving the robustness of the acoustic model to the background music. Furthermore, language model interpolation between a general-purpose language model and an in-domain lyrics-specific language model provides further improvement in transcription results. Our experiments show that our proposed strategy outperforms the existing lyrics transcription systems for polyphonic music. Moreover, we find that our proposed music-robust features specially improve the lyrics transcription performance in metal genre of songs, where the background music is loud and dominant.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源