论文标题
部分可观测时空混沌系统的无模型预测
German Parliamentary Corpus (GerParCor)
论文作者
论文摘要
储层计算是预测湍流的有力工具,其简单的架构具有处理大型系统的计算效率。然而,其实现通常需要完整的状态向量测量和系统非线性知识。我们使用非线性投影函数将系统测量扩展到高维空间,然后将其输入到储层中以获得预测。我们展示了这种储层计算网络在时空混沌系统上的应用,该系统模拟了湍流的若干特征。我们表明,使用径向基函数作为非线性投影器,即使只有部分观测并且不知道控制方程,也能稳健地捕捉复杂的系统非线性。最后,我们表明,当测量稀疏、不完整且带有噪声,甚至控制方程变得不准确时,我们的网络仍然可以产生相当准确的预测,从而为实际湍流系统的无模型预测铺平了道路。
Parliamentary debates represent a large and partly unexploited treasure trove of publicly accessible texts. In the German-speaking area, there is a certain deficit of uniformly accessible and annotated corpora covering all German-speaking parliaments at the national and federal level. To address this gap, we introduce the German Parliament Corpus (GerParCor). GerParCor is a genre-specific corpus of (predominantly historical) German-language parliamentary protocols from three centuries and four countries, including state and federal level data. In addition, GerParCor contains conversions of scanned protocols and, in particular, of protocols in Fraktur converted via an OCR process based on Tesseract. All protocols were preprocessed by means of the NLP pipeline of spaCy3 and automatically annotated with metadata regarding their session date. GerParCor is made available in the XMI format of the UIMA project. In this way, GerParCor can be used as a large corpus of historical texts in the field of political communication for various tasks in NLP.