论文标题
得分级的多提示融合,用于手语识别
Score-level Multi Cue Fusion for Sign Language Recognition
论文作者
论文摘要
标志语言是通过手和上身手势以及面部表情表达的。因此,手语识别(SLR)需要专注于所有此类线索。先前的工作使用手工制作的机制或网络聚合来提取不同的提示功能,以提高SLR性能。这很慢,涉及复杂的体系结构。我们提出了一种更直接的方法,该方法专注于训练专门用于主导手,手,面部和上身区域的单独的提示模型。我们比较专门研究这些区域的3D卷积神经网络(CNN)模型的性能,通过得分级融合并使用加权替代方案。我们的实验结果表明了混合卷积模型的有效性。使用完整的上半身,它们的融合在基线上的精度提高了19%。此外,我们还包括有关融合设置的讨论,可以帮助未来的手语翻译(SLT)工作。
Sign Languages are expressed through hand and upper body gestures as well as facial expressions. Therefore, Sign Language Recognition (SLR) needs to focus on all such cues. Previous work uses hand-crafted mechanisms or network aggregation to extract the different cue features, to increase SLR performance. This is slow and involves complicated architectures. We propose a more straightforward approach that focuses on training separate cue models specializing on the dominant hand, hands, face, and upper body regions. We compare the performance of 3D Convolutional Neural Network (CNN) models specializing in these regions, combine them through score-level fusion, and use the weighted alternative. Our experimental results have shown the effectiveness of mixed convolutional models. Their fusion yields up to 19% accuracy improvement over the baseline using the full upper body. Furthermore, we include a discussion for fusion settings, which can help future work on Sign Language Translation (SLT).