通过自动实体识别来解放殖民档案

论文标题

通过自动实体识别来解放殖民档案

Unsilencing Colonial Archives via Automated Entity Recognition

论文作者

Luthra, Mrinalini, Todorov, Konstantin, Jeurgens, Charles, Colavizza, Giovanni

论文摘要

殖民档案是从各种角度出发，因为它们包含了历史上边缘化的人的痕迹，因此是越来越兴趣的核心。不幸的是，像大多数档案馆一样，由于持续存在的障碍，它们仍然难以进入。我们在这里关注其中之一：在历史发现艾滋病中要发现的偏见，例如人类名称的索引，这些索引一直在使用。在殖民档案中，索引可以通过省略提及历史上边缘化的人来使沉默永存。为了克服这种限制并使现有发现辅助工具的范围相元，我们建议使用自动实体识别。为此，我们贡献了适合的用途注释类型，并将其应用于荷兰东印度公司（VOC）的殖民档案馆。我们发布了将近70,000个注释作为共同任务的语料库，我们使用最先进的神经网络模型为其提供基准。我们的工作旨在刺激进一步的贡献，以扩大访问（殖民地）档案的方向，将自动化整合为可能的手段。

Colonial archives are at the center of increased interest from a variety of perspectives, as they contain traces of historically marginalized people. Unfortunately, like most archives, they remain difficult to access due to significant persisting barriers. We focus here on one of them: the biases to be found in historical findings aids, such as indexes of person names, which remain in use to this day. In colonial archives, indexes can perpetuate silences by omitting to include mentions of historically marginalized persons. In order to overcome such limitations and pluralize the scope of existing finding aids, we propose using automated entity recognition. To this end, we contribute a fit-for-purpose annotation typology and apply it on the colonial archive of the Dutch East India Company (VOC). We release a corpus of nearly 70,000 annotations as a shared task, for which we provide baselines using state-of-the-art neural network models. Our work intends to stimulate further contributions in the direction of broadening access to (colonial) archives, integrating automation as a possible means to this end.

下载PDF全文

下载文献需遵守相关版权规定

论文标题