论文标题
Emoinhindi:印地语中的多标签情感和强度注释数据集,以识别对话
EmoInHindi: A Multi-label Emotion and Intensity Annotated Dataset in Hindi for Emotion Recognition in Dialogues
论文作者
论文摘要
人工智能(AI)的长期目标是建立类似人类的对话系统。这样的系统应该具有与用户建立情感联系的能力,因此对话中的情感识别是一项重要的任务。对话中的情绪检测是一项具有挑战性的任务,因为人类通常在单一话语中以不同程度的强度传达多种情绪。此外,对话说话的情感可能取决于以前的话语,使任务更加复杂。情绪识别一直是需求的巨大需求。但是,大多数用于对话中多标签情绪和强度检测的现有数据集都是英语的。为此,我们在印地语中创建了一个名为Emoinhindi的大型对话数据集,用于在包含1,814个对话的对话中,以多标签的情感和强度识别,总共有44,247个话语。我们以一种以卫生方式为犯罪受害者的心理健康和法律咨询的方式准备数据集。对话的每一个话语都以16个情感类别的一个或多个情感类别进行注释,包括中立和相应的强度价值。我们进一步提出了强大的上下文基线,这些基线可以检测情绪和在对话性上下文的情况下的相应强度。
The long-standing goal of Artificial Intelligence (AI) has been to create human-like conversational systems. Such systems should have the ability to develop an emotional connection with the users, hence emotion recognition in dialogues is an important task. Emotion detection in dialogues is a challenging task because humans usually convey multiple emotions with varying degrees of intensities in a single utterance. Moreover, emotion in an utterance of a dialogue may be dependent on previous utterances making the task more complex. Emotion recognition has always been in great demand. However, most of the existing datasets for multi-label emotion and intensity detection in conversations are in English. To this end, we create a large conversational dataset in Hindi named EmoInHindi for multi-label emotion and intensity recognition in conversations containing 1,814 dialogues with a total of 44,247 utterances. We prepare our dataset in a Wizard-of-Oz manner for mental health and legal counselling of crime victims. Each utterance of the dialogue is annotated with one or more emotion categories from the 16 emotion classes including neutral, and their corresponding intensity values. We further propose strong contextual baselines that can detect emotion(s) and the corresponding intensity of an utterance given the conversational context.