论文标题
RGB视频中的二元对话中的共同笑声分析contex
Analysis of Co-Laughter Gesture Relationship on RGB videos in Dyadic Conversation Contex
论文作者
论文摘要
虚拟药物的发展使人类 - 阿瓦塔尔的互动变得越来越丰富和多样化。此外,一种表达的虚拟代理,即模仿情绪的自然表达,增强用户(人)与代理(智能机器)之间的社交互动。因此,虚拟特征的非语言行为集是在人机相互作用的背景下的重要组成部分。笑声不仅是音频信号,而且是多模式非语言交流的固有关系,除了音频外,它还还包括面部表情和身体运动。运动分析通常依赖于相关的运动捕获数据集,但主要问题是获取此类数据集是昂贵且耗时的。这项工作研究了二元对话中的笑声与身体运动之间的关系。使用基于深度学习的姿势估计器模型从视频中提取身体运动。我们发现,在探索的NDC-ME数据集中,关节运动的单个统计特征(即,最大值或最大值的最大值或最大值)与笑声强度弱相关30%。但是,我们没有发现音频功能与身体运动之间的直接相关性。我们讨论将此类数据集用于音频驱动的共同笑声综合任务所面临的挑战。
The development of virtual agents has enabled human-avatar interactions to become increasingly rich and varied. Moreover, an expressive virtual agent i.e. that mimics the natural expression of emotions, enhances social interaction between a user (human) and an agent (intelligent machine). The set of non-verbal behaviors of a virtual character is, therefore, an important component in the context of human-machine interaction. Laughter is not just an audio signal, but an intrinsic relationship of multimodal non-verbal communication, in addition to audio, it includes facial expressions and body movements. Motion analysis often relies on a relevant motion capture dataset, but the main issue is that the acquisition of such a dataset is expensive and time-consuming. This work studies the relationship between laughter and body movements in dyadic conversations. The body movements were extracted from videos using deep learning based pose estimator model. We found that, in the explored NDC-ME dataset, a single statistical feature (i.e, the maximum value, or the maximum of Fourier transform) of a joint movement weakly correlates with laughter intensity by 30%. However, we did not find a direct correlation between audio features and body movements. We discuss about the challenges to use such dataset for the audio-driven co-laughter motion synthesis task.