论文标题

改变表示形式:检查神经手语的语言表示

Changing the Representation: Examining Language Representation for Neural Sign Language Production

论文作者

Walsh, Harry, Saunders, Ben, Bowden, Richard

论文摘要

神经手语的生产(SLP)旨在自动从口语句子转换为手语视频。从历史上看,SLP任务已分为两个步骤。首先,将语言句子转换为掩饰序列,其次,制作了一系列彩色的手语视频。在本文中,我们将自然语言处理技术应用于SLP管道的第一步。我们使用诸如bert和word2vec之类的语言模型来创建更好的句子级别的嵌入,并应用多种令牌化技术,以说明这些技术如何改善文本资源转换任务的性能上的性能。我们将文本介绍给Hamnosys(T2H)翻译,并展示了使用语音表示形式翻译而不是标志级别的光泽表示的优势。此外,我们使用Hamnosys提取标志的手形状,并在训练过程中将其用作额外的监督,从而进一步提高T2H的性能。组装最佳实践时,我们在矿体数据集中获得了26.99的BLEU-4分数,在两个新的最先进的基线的凤凰城25.09上获得了25.09。

Neural Sign Language Production (SLP) aims to automatically translate from spoken language sentences to sign language videos. Historically the SLP task has been broken into two steps; Firstly, translating from a spoken language sentence to a gloss sequence and secondly, producing a sign language video given a sequence of glosses. In this paper we apply Natural Language Processing techniques to the first step of the SLP pipeline. We use language models such as BERT and Word2Vec to create better sentence level embeddings, and apply several tokenization techniques, demonstrating how these improve performance on the low resource translation task of Text to Gloss. We introduce Text to HamNoSys (T2H) translation, and show the advantages of using a phonetic representation for sign language translation rather than a sign level gloss representation. Furthermore, we use HamNoSys to extract the hand shape of a sign and use this as additional supervision during training, further increasing the performance on T2H. Assembling best practise, we achieve a BLEU-4 score of 26.99 on the MineDGS dataset and 25.09 on PHOENIX14T, two new state-of-the-art baselines.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源