论文标题
来自波斯尼亚 - 黑塞哥维那,克罗地亚和塞尔维亚的情绪宣布的议会辩论的parlasent-BCS数据集
The ParlaSent-BCS dataset of sentiment-annotated parliamentary debates from Bosnia-Herzegovina, Croatia, and Serbia
论文作者
论文摘要
议会辩论中情绪的表达被认为与社交媒体或产品评论中的情感明显不同。本文增加了关于议会辩论的新兴研究,其中包含一个注释的句子,以探测情感的句子,以探测情感。我们从三个东南部议会的诉讼中进行了注释的句子:克罗地亚,波斯尼亚 - 黑塞哥维那和塞尔维亚。将六级架构应用于数据,目的是训练一个分类模型,以检测议会诉讼中的情感。 Krippendorff的Alpha测量通道间协议的范围从六级注释模式的0.6到三级模式的0.75,而两级模式为0.83。我们在数据集上的最初实验表明,变压器模型的性能要比使用更简单的体系结构的模型要好得多。此外,无论三种语言的相似性如何,我们都会观察到不同语言的性能差异。执行议会特定的培训和评估表明,议会之间不同性能不同的主要原因似乎是自动分类任务的不同复杂性,这在注释者的性能中无法观察到。语言距离似乎在注释者和自动分类性能中都没有任何作用。我们在允许许可下发布数据集和表现最佳模型。
Expression of sentiment in parliamentary debates is deemed to be significantly different from that on social media or in product reviews. This paper adds to an emerging body of research on parliamentary debates with a dataset of sentences annotated for detection sentiment polarity in political discourse. We sample the sentences for annotation from the proceedings of three Southeast European parliaments: Croatia, Bosnia-Herzegovina, and Serbia. A six-level schema is applied to the data with the aim of training a classification model for the detection of sentiment in parliamentary proceedings. Krippendorff's alpha measuring the inter-annotator agreement ranges from 0.6 for the six-level annotation schema to 0.75 for the three-level schema and 0.83 for the two-level schema. Our initial experiments on the dataset show that transformer models perform significantly better than those using a simpler architecture. Furthermore, regardless of the similarity of the three languages, we observe differences in performance across different languages. Performing parliament-specific training and evaluation shows that the main reason for the differing performance between parliaments seems to be the different complexity of the automatic classification task, which is not observable in annotator performance. Language distance does not seem to play any role neither in annotator nor in automatic classification performance. We release the dataset and the best-performing model under permissive licences.