论文标题
法兰克福拉丁文词典:从形态扩展和单词嵌入到半函数
The Frankfurt Latin Lexicon: From Morphological Expansion and Word Embeddings to SemioGraphs
论文作者
论文摘要
在本文中,我们介绍了法兰克福拉丁词典(FLL),这是一种用于中世纪拉丁文的词汇资源,既用于拉丁文本的诱饵和诱饵后编辑。我们描述了柠檬化剂发展的最新进展,并对他们反对Copitularies语料库进行测试(包括Frankish Royal Erdicts,6世纪中叶至9世纪中叶),这是一种用于处理中世纪拉丁语的参考。我们还使用有限的众包过程,旨在连续审查和更新FLL进行训练后进行训练后的校正。从这个障碍过程产生的文本开始,我们通过单词嵌入来描述FLL的扩展,其互动式通过半函数横穿遍历数字增强了数字增强的诠释学圆。通过这种方式,本文提出了对诱饵的更全面的理解,涵盖了古典机器学习以及智力后校正,尤其是基于基于基础词汇资源的图形表示的解释过程的形式的人类计算。
In this article we present the Frankfurt Latin Lexicon (FLL), a lexical resource for Medieval Latin that is used both for the lemmatization of Latin texts and for the post-editing of lemmatizations. We describe recent advances in the development of lemmatizers and test them against the Capitularies corpus (comprising Frankish royal edicts, mid-6th to mid-9th century), a corpus created as a reference for processing Medieval Latin. We also consider the post-correction of lemmatizations using a limited crowdsourcing process aimed at continuous review and updating of the FLL. Starting from the texts resulting from this lemmatization process, we describe the extension of the FLL by means of word embeddings, whose interactive traversing by means of SemioGraphs completes the digital enhanced hermeneutic circle. In this way, the article argues for a more comprehensive understanding of lemmatization, encompassing classical machine learning as well as intellectual post-corrections and, in particular, human computation in the form of interpretation processes based on graph representations of the underlying lexical resources.