论文标题
树木平面线性化中的边缘长度的预期总和。理论和应用
The expected sum of edge lengths in planar linearizations of trees. Theory and applications
论文作者
论文摘要
依赖树已被证明是一个非常成功的模型,以代表人类语言句子的句法结构。在这些结构中,顶点是单词和边缘连接句法依赖的单词。这些依赖关系的趋势已使用随机基线来证明边缘或其变体的长度之和。无处不在的基线是投影订单中的预期总和(其中边缘不交叉,句子的根单词不被任何边缘覆盖),可以在时间$ o o(n)$中计算出来。在这里,我们专注于较弱的正式约束,即平面性。在理论领域中,我们提出了平面性的特征,鉴于句子,该域将产生平面排列的数量或有效的算法,以生成单词的均匀随机平面排列。我们还展示了平面布置中的预期总和与投影安排中的预期总和之间的关系。在应用程序域中,我们得出了A $ O(n)$ - 时间算法来计算边缘长度总和的预期值。我们还将这项研究应用于平行语料库,发现实际依赖距离和随机基线之间的差距会随着对依赖性结构的形式约束的强度增加,这表明形式约束吸收了依赖距离的一部分最小化效果。我们的研究铺平了使用随机平面线性化作为随机基线的随机平面线性化来复制过去研究的方法。
Dependency trees have proven to be a very successful model to represent the syntactic structure of sentences of human languages. In these structures, vertices are words and edges connect syntactically-dependent words. The tendency of these dependencies to be short has been demonstrated using random baselines for the sum of the lengths of the edges or its variants. A ubiquitous baseline is the expected sum in projective orderings (wherein edges do not cross and the root word of the sentence is not covered by any edge), that can be computed in time $O(n)$. Here we focus on a weaker formal constraint, namely planarity. In the theoretical domain, we present a characterization of planarity that, given a sentence, yields either the number of planar permutations or an efficient algorithm to generate uniformly random planar permutations of the words. We also show the relationship between the expected sum in planar arrangements and the expected sum in projective arrangements. In the domain of applications, we derive a $O(n)$-time algorithm to calculate the expected value of the sum of edge lengths. We also apply this research to a parallel corpus and find that the gap between actual dependency distance and the random baseline reduces as the strength of the formal constraint on dependency structures increases, suggesting that formal constraints absorb part of the dependency distance minimization effect. Our research paves the way for replicating past research on dependency distance minimization using random planar linearizations as random baseline.