具有复杂值多项式网络的对抗音频合成

论文标题

具有复杂值多项式网络的对抗音频合成

Adversarial Audio Synthesis with Complex-valued Polynomial Networks

论文作者

Wu, Yongtao, Chrysos, Grigorios G, Cevher, Volkan

论文摘要

音频综合中的时频（TF）表示已越来越多地通过实价网络建模。但是，忽略TF表示的复杂值的性质可能会导致次优性能，并且需要其他模块（例如，用于对阶段进行建模）。为此，我们介绍了称为Apollo的复杂值多项式网络，该网络以自然方式集成了这种复杂值的表示。具体而言，阿波罗使用高阶张量作为缩放参数捕获输入元素的高阶相关性。通过利用标准张量分解，我们得出了不同的体系结构并启用建模更丰富的相关性。我们概述了这样的体系结构，并在四个基准测试中展示了它们在音频发电中的表现。重点，阿波罗（Apollo）在ADIO生成中SC09数据集中的最先进的扩散模型中的$ 17.5 \％$改善对对抗方法的改善和8.2 \％$。我们的模型可以鼓励在复杂领域的其他高效体系结构进行系统的设计。

Time-frequency (TF) representations in audio synthesis have been increasingly modeled with real-valued networks. However, overlooking the complex-valued nature of TF representations can result in suboptimal performance and require additional modules (e.g., for modeling the phase). To this end, we introduce complex-valued polynomial networks, called APOLLO, that integrate such complex-valued representations in a natural way. Concretely, APOLLO captures high-order correlations of the input elements using high-order tensors as scaling parameters. By leveraging standard tensor decompositions, we derive different architectures and enable modeling richer correlations. We outline such architectures and showcase their performance in audio generation across four benchmarks. As a highlight, APOLLO results in $17.5\%$ improvement over adversarial methods and $8.2\%$ over the state-of-the-art diffusion models on SC09 dataset in audio generation. Our models can encourage the systematic design of other efficient architectures on the complex field.

下载PDF全文

下载文献需遵守相关版权规定

论文标题