论文标题
发现文本结构:使用模板树的生成语法诱导
Discovering Textual Structures: Generative Grammar Induction using Template Trees
论文作者
论文摘要
自然语言生成为设计师提供了自动生成文本的方法,例如用于创建摘要,聊天机器人和游戏内容。在实践中,经常学习和解释文本生成器,或者使用语法和模板等技术手工创建。在本文中,我们介绍了一种新型的语法诱导算法,用于学习出于生成目的的可解释语法,称为Gitta。我们还介绍了模板树的新颖概念,以发现Corpora中的潜在模板,以得出这些生成语法。通过使用现有的人类创建的语法,我们发现该算法只能使用几个示例合理地近似这些语法。这些结果表明,GITTA可用于自动学习可解释且易于修改的语法,从而为人类机器的共同创建生成模型提供了垫脚石。
Natural language generation provides designers with methods for automatically generating text, e.g. for creating summaries, chatbots and game content. In practise, text generators are often either learned and hard to interpret, or created by hand using techniques such as grammars and templates. In this paper, we introduce a novel grammar induction algorithm for learning interpretable grammars for generative purposes, called Gitta. We also introduce the novel notion of template trees to discover latent templates in corpora to derive these generative grammars. By using existing human-created grammars, we found that the algorithm can reasonably approximate these grammars using only a few examples. These results indicate that Gitta could be used to automatically learn interpretable and easily modifiable grammars, and thus provide a stepping stone for human-machine co-creation of generative models.