论文标题
无线深度语音语义传输
Wireless Deep Speech Semantic Transmission
论文作者
论文摘要
在本文中,我们提出了一类新的高效语义编码传输方法,用于通过无线通道的端到端语音传输。我们将整个系统命名为深度语音语义传输(DSST)。具体而言,我们引入了非线性变换,将语音源映射到语义潜在空间,并将语义特征馈送到源通道编码器中,以生成通道输入序列。在各种建模思想的指导下,我们在潜在空间上建立了一个熵模型,以估计语义特征嵌入之间的重要性多样性。因此,这些具有不同重要性的语义特征可以合理地通过不同的编码率分配,从而最大化系统编码增益。此外,我们引入了通道信噪比(SNR)适应机制,以便可以在各种通道状态上应用单个模型。我们的模型的端到端优化导致了灵活的利率 - 延伸(RD)权衡,并支持多功能无线语音语义传输。实验结果验证了我们的DSST系统在客观和主观指标上显然优于当前工程的语音传输系统。与现有的神经语音语义传输方法相比,我们的模型在达到相同质量时节省了多达75%的频道带宽成本。可以在https://ximoo123.github.io/dsst上找到音频演示的直观比较。
In this paper, we propose a new class of high-efficiency semantic coded transmission methods for end-to-end speech transmission over wireless channels. We name the whole system as deep speech semantic transmission (DSST). Specifically, we introduce a nonlinear transform to map the speech source to semantic latent space and feed semantic features into source-channel encoder to generate the channel-input sequence. Guided by the variational modeling idea, we build an entropy model on the latent space to estimate the importance diversity among semantic feature embeddings. Accordingly, these semantic features of different importance can be allocated with different coding rates reasonably, which maximizes the system coding gain. Furthermore, we introduce a channel signal-to-noise ratio (SNR) adaptation mechanism such that a single model can be applied over various channel states. The end-to-end optimization of our model leads to a flexible rate-distortion (RD) trade-off, supporting versatile wireless speech semantic transmission. Experimental results verify that our DSST system clearly outperforms current engineered speech transmission systems on both objective and subjective metrics. Compared with existing neural speech semantic transmission methods, our model saves up to 75% of channel bandwidth costs when achieving the same quality. An intuitive comparison of audio demos can be found at https://ximoo123.github.io/DSST.