论文标题
印尼自动问题生成器的序列到序列学习
Sequence-to-Sequence Learning for Indonesian Automatic Question Generator
论文作者
论文摘要
自动问题的生成定义为给定各种文本数据的问题自动化的任务。自动问题生成器(AQG)的研究已经进行了10多年,主要集中在Factoid问题上。在所有这些研究中,使用序列对序列方法获得了最先进的方法。但是,印尼语的AQG系统从未经过深入研究。在这项工作中,我们构建了印尼自动问题生成器,从而从一些以前的作品中调整了体系结构。总而言之,我们使用BigRU,BilstM和Transform使用了其他语言特征,复制机制和覆盖机制。由于没有公共大型Dan流行的印尼数据集用于问题生成,因此我们翻译了Squad v2.0 Factoid问题回答数据集,并设置了其他印尼Tydiqa Dev进行测试。该系统分别获得了BLEU1,BLEU2,BLEU3,BLEU4和ROUGE-L得分,分别为38,35、20,96、10,68、5,78和43,4和43,4,分别为Tydiqa,分别为39.9、20.9、20.78、20.78、10.26、6.31、44.13。当预期答案被命名实体并且在句法上与上下文解释时,系统的性能很好。此外,从印尼本地人的角度来看,我们最好的模型在最佳案例上产生的最佳问题是可以接受且相当有用的。
Automatic question generation is defined as the task of automating the creation of question given a various of textual data. Research in automatic question generator (AQG) has been conducted for more than 10 years, mainly focused on factoid question. In all these studies, the state-of-the-art is attained using sequence-to-sequence approach. However, AQG system for Indonesian has not ever been researched intensely. In this work we construct an Indonesian automatic question generator, adapting the architecture from some previous works. In summary, we used sequence-to-sequence approach using BiGRU, BiLSTM, and Transformer with additional linguistic features, copy mechanism, and coverage mechanism. Since there is no public large dan popular Indonesian dataset for question generation, we translated SQuAD v2.0 factoid question answering dataset, with additional Indonesian TyDiQA dev set for testing. The system achieved BLEU1, BLEU2, BLEU3, BLEU4, and ROUGE-L score at 38,35, 20,96, 10,68, 5,78, and 43,4 for SQuAD, and 39.9, 20.78, 10.26, 6.31, 44.13 for TyDiQA, respectively. The system performed well when the expected answers are named entities and are syntactically close with the context explaining them. Additionally, from native Indonesian perspective, the best questions generated by our best models on their best cases are acceptable and reasonably useful.