优化文本表示以捕获政党之间的相似性（DIS）

论文标题

优化文本表示以捕获政党之间的相似性（DIS）

Optimizing text representations to capture (dis)similarity between political parties

论文作者

Ceron, Tanise, Blokker, Nico, Padó, Sebastian

论文摘要

即使微调的神经语言模型在实现“深”自动文本分析方面仍然是关键的，但针对特定应用的优化文本表示仍然是至关重要的瓶颈。在这项研究中，我们在计算社会科学的任务中查看这个问题，即对政党之间的成对相似性进行建模。我们的研究问题是创建强大的文本表示形式的结构信息是什么水平，将一种强烈的知情方法（同时使用索赔跨度和索赔类别注释）与放弃一种或两种类型的注释与基于文档结构的启发式方法的方法进行了对比。评估我们在2021年联邦大选的德国政党宣言上的模型。我们发现，在不需要手动注释的无需手动注释的情况下，可以最大程度地在党之间的相似性以及正常化的步骤中最大程度地提高党内相似性以及标准化步骤。

Even though fine-tuned neural language models have been pivotal in enabling "deep" automatic text analysis, optimizing text representations for specific applications remains a crucial bottleneck. In this study, we look at this problem in the context of a task from computational social science, namely modeling pairwise similarities between political parties. Our research question is what level of structural information is necessary to create robust text representation, contrasting a strongly informed approach (which uses both claim span and claim category annotations) with approaches that forgo one or both types of annotation with document structure-based heuristics. Evaluating our models on the manifestos of German parties for the 2021 federal election. We find that heuristics that maximize within-party over between-party similarity along with a normalization step lead to reliable party similarity prediction, without the need for manual annotation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题