论文标题
学习结构上明确的概率语法
Learning of Structurally Unambiguous Probabilistic Grammars
论文作者
论文摘要
识别概率上下文不含语法的问题有两个方面:第一个是确定语法的拓扑(语法规则),第二个是估计每个规则的概率权重。考虑到一般来说,尤其是学习无上下文语法的硬度结果,尤其是概率语法,大多数文献都集中在第二个问题上。在这项工作中,我们解决了第一个问题。我们将注意力限制在结构上明确的加权语法(SUWCFG)上,并为结构上明确的概率无上下文语法(SUPCFG)提供了查询学习算法。我们证明可以使用共线性多重树自动机(CMTA)来表示SUWCFG,并提供一种学习CMTA的多项式学习算法。我们表明,可以将学习的CMTA转换为概率语法,从而提供了一种完整的算法,用于学习结构明确的概率上下文无语法(语法拓扑和概率权重),并使用结构化的成员资格查询和结构的等价Queries。我们证明了算法在学习基因组数据上学习PCFG的有用性。
The problem of identifying a probabilistic context free grammar has two aspects: the first is determining the grammar's topology (the rules of the grammar) and the second is estimating probabilistic weights for each rule. Given the hardness results for learning context-free grammars in general, and probabilistic grammars in particular, most of the literature has concentrated on the second problem. In this work we address the first problem. We restrict attention to structurally unambiguous weighted context-free grammars (SUWCFG) and provide a query learning algorithm for structurally unambiguous probabilistic context-free grammars (SUPCFG). We show that SUWCFG can be represented using co-linear multiplicity tree automata (CMTA), and provide a polynomial learning algorithm that learns CMTAs. We show that the learned CMTA can be converted into a probabilistic grammar, thus providing a complete algorithm for learning a structurally unambiguous probabilistic context free grammar (both the grammar topology and the probabilistic weights) using structured membership queries and structured equivalence queries. We demonstrate the usefulness of our algorithm in learning PCFGs over genomic data.