论文标题
一个新的对齐简单的德国语料库
A New Aligned Simple German Corpus
论文作者
论文摘要
德国与简单英语的同行“莱希特·斯普拉奇(Leichte Sprache)”是一种旨在促进复杂的书面语言的受监管语言,否则不同的人群将无法访问。我们为简单德语 - 德语提供了一个新的与句子一致的单语语料库。它包含多个使用自动句子对准方法对齐的文档对准源。我们根据手动标记的对齐文档子集评估对齐方式。通过F1-Score衡量的句子一致性质量超过了先前的工作。我们根据CC BY-SA和MIT许可证的随附代码发布数据集。
"Leichte Sprache", the German counterpart to Simple English, is a regulated language aiming to facilitate complex written language that would otherwise stay inaccessible to different groups of people. We present a new sentence-aligned monolingual corpus for Simple German -- German. It contains multiple document-aligned sources which we have aligned using automatic sentence-alignment methods. We evaluate our alignments based on a manually labelled subset of aligned documents. The quality of our sentence alignments, as measured by F1-score, surpasses previous work. We publish the dataset under CC BY-SA and the accompanying code under MIT license.