论文标题
西班牙客户支持成绩单中的标点符号修复使用转移学习
Punctuation Restoration in Spanish Customer Support Transcripts using Transfer Learning
论文作者
论文摘要
自动语音识别(ASR)系统通常会产生不符合可读性较差的未符合的转录本。此外,构建标点符号修复系统对于低资源语言而言是挑战,尤其是针对特定领域的应用程序。在本文中,我们提出了一个旨在实时客户支持转录服务的西班牙标点符号修复系统。为了解决客户支持域中西班牙成绩单的数据稀疏性,我们介绍了两种基于转移学习的策略:1)使用室外西班牙文本数据的域适应性; 2)跨语性转移学习利用英语成绩单数据。我们的实验结果表明,这些策略提高了西班牙标点符号修复系统的准确性。
Automatic Speech Recognition (ASR) systems typically produce unpunctuated transcripts that have poor readability. In addition, building a punctuation restoration system is challenging for low-resource languages, especially for domain-specific applications. In this paper, we propose a Spanish punctuation restoration system designed for a real-time customer support transcription service. To address the data sparsity of Spanish transcripts in the customer support domain, we introduce two transfer-learning-based strategies: 1) domain adaptation using out-of-domain Spanish text data; 2) cross-lingual transfer learning leveraging in-domain English transcript data. Our experiment results show that these strategies improve the accuracy of the Spanish punctuation restoration system.