论文标题
用于代码切换的声音嵌入系统逐个示例性术语检测
Acoustic Word Embedding System for Code-Switching Query-by-example Spoken Term Detection
论文作者
论文摘要
在本文中,我们提出了一个深度卷积神经网络的声音嵌入式系统,以示例性术语检测为示例开关查询。与以前的配置不同,我们将音频数据组合为两种语言进行培训,而不仅仅是使用一种单一语言。我们将关键字模板的声学特征和搜索内容转换为固定维矢量,并计算关键字段和搜索以滑动方式获得的内容段之间的距离。辅助变异性损失也适用于同一单词中的训练数据,但说话者不同。该策略用于防止提取器编码与声学单词嵌入中的不希望的说话者或重音相关的信息。实验结果表明,我们提出的系统在代码切换测试方案中产生有希望的搜索结果。随着模板数量的增加和可变性不变损失的使用,搜索性能进一步提高。
In this paper, we propose a deep convolutional neural network-based acoustic word embedding system on code-switching query by example spoken term detection. Different from previous configurations, we combine audio data in two languages for training instead of only using one single language. We transform the acoustic features of keyword templates and searching content to fixed-dimensional vectors and calculate the distances between keyword segments and searching content segments obtained in a sliding manner. An auxiliary variability-invariant loss is also applied to training data within the same word but different speakers. This strategy is used to prevent the extractor from encoding undesired speaker- or accent-related information into the acoustic word embeddings. Experimental results show that our proposed system produces promising searching results in the code-switching test scenario. With the increased number of templates and the employment of variability-invariant loss, the searching performance is further enhanced.