论文标题

2平面依赖性解析的括号编码

Bracketing Encodings for 2-Planar Dependency Parsing

论文作者

Strzyz, Michalina, Vilares, David, Gómez-Rodríguez, Carlos

论文摘要

我们提出了一个基于括号的编码,该编码可用于表示长度为n的序列作为n标签的序列,以表示任何2个平面依赖性树,因此提供了几乎完全按顺序标记解析的交叉弧的总覆盖范围。首先,我们表明现有的括号编码用于解析,因为标签只能处理投射树的非常温和的扩展。其次,我们通过考虑了2个平面性的众所周知的特性来克服这一限制,这是在树库中绝大多数依赖性句法结构中存在的,即,依赖树的弧可以分为两个平面,以使给定平面中的ARC不交叉。我们利用此属性来设计一种平衡括号并编码属于每个平面的弧的方法,从而使几乎不受限制的非标记性(99.9%的覆盖率)按顺序标记进行解析。实验表明,我们的线性化在高度非标准树库中编码原始括号的准确性(平均为0.4 LAS),同时达到了相似的速度。同样,当不将POS标签用作模型的输入参数时,它们特别适合。

We present a bracketing-based encoding that can be used to represent any 2-planar dependency tree over a sentence of length n as a sequence of n labels, hence providing almost total coverage of crossing arcs in sequence labeling parsing. First, we show that existing bracketing encodings for parsing as labeling can only handle a very mild extension of projective trees. Second, we overcome this limitation by taking into account the well-known property of 2-planarity, which is present in the vast majority of dependency syntactic structures in treebanks, i.e., the arcs of a dependency tree can be split into two planes such that arcs in a given plane do not cross. We take advantage of this property to design a method that balances the brackets and that encodes the arcs belonging to each of those planes, allowing for almost unrestricted non-projectivity (round 99.9% coverage) in sequence labeling parsing. The experiments show that our linearizations improve over the accuracy of the original bracketing encoding in highly non-projective treebanks (on average by 0.4 LAS), while achieving a similar speed. Also, they are especially suitable when PoS tags are not used as input parameters to the models.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源